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Detailed Description Text (60) : 

The sequence and genomic structure of the b.IAP gene show high homology to all known 
TSAP genes. The smallest exon, exon VII, is only 73 bp long while the longest exon, 
exon XI, is approximately 1 . 1 kb long. The exact length of exon 11 cannot be 
determined since no cDNA with a poIy-A tail had been isolated. The estimate given is 
based on the identification of a putative poly-adenylation site AATAAA (bp 5183-5188) 
in the 3* non-coding region of the gene (underlined in FIG. 1). The introns are among 
the smallest introns reported (Hawkins, Nucl. Acids Res. 16:9893-9908 (1988)) as was 
found in the case of other TSAP genes as well (Manes et al . , supra; Hernthorn et al . , 
supra; Knoll et al . , supra; Millan and Manes, supra). The largest one, splitting exon 
V and exon VI, is only 257 bp long. All exon-intron junctions conform to the GT-AG 
rule (Breathnach et al . , Proc . Natl. Acad. Sci. USA 75:4853-4857 (1978)) and also 
conform well to the consensus sequences (C/A) AG/GT (A/G) AGT (SEQ ID NO: 4) and 
(T/C).sub.n N(C/T)AG/G (SEQ ID NO: 5) for donor and acceptor sites, respectively 
(Mount, Nucl. Acids Res. 10:459-473 (1982)). 
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Priority Application Year (1) : 
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Priority Application Year (2) : 
1987 

Priority Application Year (3) : 
1987 

Drawing Description Text (14) : 

FIG. 8 Nucleotide sequence of the 0.7 kb insert of clone pZ 183-1. The proposed 
poly (A ) attachment site is underlined. 

Drawing Description Text (22) : 

FIG. 14 Nucleotide sequence of the genomic form of .lambda. 5 containing all cDNA 
sequences and the amino acid sequences deduced from them. cDNA sequences of the 
pZ183-la clone are boxed. The sequences are divided into three parts which show the 
three exons and genomic sequences 5' and 3' adjacent to them, a and b indicate the 
major sites of initiation of transcription determined by the primer extension 
experiment shown in FIG. 17. The 18 nucleotides synthesized as a primer for the 
extension method are indicated with a broken line over the sequence. Sequences boxed 
in broken lines indicate the 5' part of exon I determined by this primer extension 
method (see FIG. 17) . GT (donor) and AG (acceptor) splicing signal sequences of 
introns are underlined. The triplet indicated by three closed circles ( . cndot . ) above 
the sequence shows the possible translation start codon ATG. The sequence underlined 
by six circles shows the poly (A ) addition signal sequence. 

Drawing Description Text (27) : 

FIG. 17 Primer extension analysis of the 5' end of .lambda. 5 mRNA. The synthetic 
oligonucleotide 5 * - CAGAGTCTGTCCTACTCT - 3 ' complementary to a 18 nucleotide sequence in 
Exon I (see FIG. 14, position 416-433) was labeled and hybridized to pre-B cell line 
4 0E-1 poly A containing RNA (lane 1) and yeast t-RNA (lane 2) . 

Drawing Description Text (34) : 

FIG. 22 Nucleotide sequence and deduced amino acid sequence of the V.sub.preB 1 and 
V.sub.preB 2 gene. For V.sub.preB 1 nucleotide sequences of both the genomic form 
(7pB12-2) as well as of the cDNA (pZ121) are given. The cDNA sequences are identical 
with the genomic sequences and are, therefore, only indicated by dashes (--) and 
follows the genomic sequence in numbering. Numbering of amino acid residues starts 
with -19 as the first position of the leader and proceeds to +1 as the first position 

of the mature protein. The sequence marked by closed circles ( ) shows the poly A 

addition signal sequence. Arrows (.dwnarw.) indicate potential splice sites. The 
asterisk (*) points to the termination codon TAG. DNA sequencing was carried out using 
the dideoxy chain termination method by subcloning of fragments into M13mpl8 and 
M13mpl9 vectors, using a 17-mer universal M13 primer (Amersham) . The V.sub.preB 2 
nucleotide sequence of the genomic form (7pB70-l) is given. (For a restriction map of 
7pB70-l, see FIG. 25). V.sub.preB 2 sequences identical to V.sub.preB 1 sequences are 
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indicated by dashes (--). Wherever the deduced amino acid sequence of V.sub.preB 2 
differs from that of V.sub.preB 1 the changed am acid is given in brackets ( ). 

Drawing Description Text (43) : 

FIG. 29 Northern blot analysis of poly (A ) -selected RNA from lymphoid cells. 
Drawing Description Text (44) : 

5 .mu.g poly (A ) . sup. + RNA was applied to each lane, electrophoresed and blotted onto 
activated DPT paper. Identical filters were probed with: (A) a .sup. 32 P-labelled 1.2 
kb PstI fragment of pHVpB-6 or (B) a .sup. 32 P-labelled 560 bp EcoRI-AccI fragment 
from pZ121. a mouse V.sub.preB 1 cDNA clone. The filter in panel (A) was washed 
finally in 0 . 2 . times . SSC . 0.1% SDS at 65. degree. C, then exposed to x-ray film 
overnight at -80. degree. C. with intensifying screens. The filter in panel (B) was 
washed finally in 0 . 2 . times . SSC , 0.1% SDS at 37. degree. C. and exposed as described 
above. Sizes of hybridizing bands were calculated using RNA molecular weight standards 
purchased from BRL (Bethesda, Md.) . 

Detailed Description Text (2) : 

The nucleotide sequences which are selectively expressed in pre-B cells may be 
identified by subtraction hybridization, i.e., by a method in which nucleotide 
sequences which are expressed in pre-B cells and in other cells are eliminated and 
only those sequences are selected which are solely expressed in pre-B cells. More 
specifically the nucleotide sequence is identified by preparing a cDNA library from 
poly A containing RNA from a pre-B cell and selecting from that library cDNA clones 
which hybridize to polysomal poly A containing RNA from a pre-B cell and not to 
polysomal poly A containing RNA from a different cell which is not a pre-B cell. 

Detailed Description Text (4) : 

For identifying a first nucleotide sequence selectively expressed in pre-B cells mRNA, 
preferably microsome-bound polysomal poly A containing RNA can be isolated from pre-B 
cells by methods known in the art (e.g. Maniatis, et al . , supra, pp. 188-209) or as 
described in the Example. Since it is difficult to isolate a sufficient number of such 
cells from a mammalian organism, especially from a human organism, a cell line derived 
from such a subset of the lymphoid cell population may be chosen. 

Detailed Description Text (6) : 

A cDNA library, i.e., a collection of DNA's complementary to the poly A containing RNA 
from pre-B cells can be prepared by methods known in the art (e.g. Maniatis, et al . , 
supra, pp. 211-246) or as described in the Example. Repeated subtraction hybridization 
using polysomal poly A containing RNA from a pre-B cell and from a cell which is not a 
pre-B cell is used to isolate a cDNA clone comprising a nucleotide sequence which is 
selectively expressed in pre-B cells. Suitable cells which are not pre-B cells are 
those distinctly different from cells of the early stages of the B cell lineage but 
related to the latter cells, so that they both express a similar subset of genes. 
Examples of such cells are lymphoid cells, e.g., T cells, preferably a T cell 
hybridoma . 

Detailed Description Text (16) : 

In the present invention the selected 70Z/3 cDNA sequence hybridized specifically to a 
1.2 kb size transcript present in a variety of pre-B cells. No hybridization was found 
under the same conditions using poly A containing RNA from mature B cells, plasma 
cells, T cells and other cells which are not from the B cell lineage. The selected 
70Z/3 cDNA sequence is 380 nucleotide pairs long. It represents therefore only a 
partial cDNA of the 1.2 kb long transcript of the gene selectively expressed in pre-B 
cells . 

Detailed Description Text (49) : 

Poly A containing RNA was prepared by repeated oligo(dT) cellulose chromatography (P-L 
Pharmacia, Uppsala, Sweden) in the presence of dimethyl sulfoxide as described by 
Bantle (Anal Biochem. 72, 413-427, [1976]). 

Detailed Description Text (51) : 

A subtracted cDNA library for pre-B cell specific clones was constructed essentially 
according to the method of Davis et al . (supra). The first cDNA strand was synthesized 
with 10 .mu.g of microsome-bound polysomal poly A containing RNA from 70Z/3 in 50 mM 
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Tris-HCl, pH 8.3, 6 mM MgCl.sub.2, 60 mM NaCl, 20 mM dithiothreitol (DTT) , 10 .mu.g/ml 
of oligo (dT. sub. 12-18) , 1 mM of each four deoxyribonucleotides 100 .mu.Ci of .sup. 32 
P-dCTP (.about. 3000 Ci/mmol) , 60 units/ml of placental ribonuclease inhibitor 100 
.mu.g/ml of actinomycin D and 100 units of AMV reverse transcriptase (Stehlin, Co., 
Basle, Switzerland) at 40. degree. C. for 2 hours. 

Detailed Description Text (52) : 

After RNA hydrolysis, hybridization reactions were performed in 0 . 5M phosphate buffer, 
5 mM EDTA, 0.1% lithium laurylsulf ate with twenty times excess amount of polysomal 
poly A containing RNA from a T cell hybridoma, e.g., from the T cell hybridoma K62 at 
68. degree. C. for 16-20 hours to achieve a Cot of 2000. The single- stranded fraction 
after hydroxylapatite (Bio-Rad Laboratories, Richmond, CA, USA) chromatography in 
0.12M phosphate buffer, 0.1% lithium laurylsulf ate , 5 mM EDTA at 65. degree. C. was 
re-subtracted in the same conditions as above. 

Detailed Description Text (57) : 

Prehybridization and hybridization were performed in S.times.SSPE (750 mM NaCl, 50 mM 
NaH.sub.2 P0.sub.4, 5 mM EDTA; pH 7,4), 5 . times . Denhardt ' s solution (Maniatis et al . , 
supra, p. 327), 1% SDS, 1 .mu.g/ml tRNA from E. coli, 1 .mu.g/ml poly (A ) , 100 .mu.g/ml 
of denatured salmon sperm DNA at 68. degree. C. and the filters were washed finally 
with 0 . 1 . times . SSC ( 1 . times . SSC=standard saline citrate=150 mM NaCl, 15 mM trisodium 
citrate, pH 7,0), 1% SDS at 65. degree. C. The clones which showed positive after 7 
days of exposure with intensifying screen at -70. degree. C. were rescreened with both 
probes. 200 individual phage clones were selected by this procedure out of 50,000 
clones . 

Detailed Description Text (59) : 

One radiolabeled insert DNA fragment designated pZ 183 was selected and used for 
hybridization with a panel of RNA preparations from various cells to show that the 
insert DNA fragment hybridizes selectively to poly A containing RNA from pre-B cells. 
Preferably, different cell lines of different lineages at different stages of 
differentiation are used. These cell lines are considered to be "frozen" at a certain 
stage of normal cell development and, thus, represent the phenotype of these normal 
counterparts . 

Detailed Description Text (71) : 

The size of the pZ 183-specific transcript ( s ) was analyzed in RNA prepared from either 
unstimulated or LPS-stimulated 70Z/3 cells, by Northern blot analysis (FIG. 3) . 
Cytoplasmic RNA was isolated (Chirgwin, et al . , Biochemistry 18, 5294-5299, [1979]) 
from 70Z/3 cells cultured with or without LPS (10 .mu.g/ml) for 12 hours. The RNA was 
enriched for poly A containing RNA by oligo (dT) cellulose chromatography. One to 
forty micrograms of an RNA sample wee electrophoretically separated in an 
agarose/ formaldehyde gel and transferred to nitrocellulose filters as described 
(Maniatis et al . , supra, pp 382-389). Prehybridization and hybridization were 
performed as described in the sections of differential hybridization but with 50% 
formamide at 42. degree. C. The filters were finally washed with 0 . 2 . times . SSC, 1% SDS 
at 65. degree. C. 

Detailed Description Text (74) : 

RNA was prepared from cells of spleen, thymus, kidney bone marrow, lung, heart, brain 
and liver. Poly A containing RNA isolated from these organs (5 .mu.g each) and poly A 
containing RNA from 70Z/3 cells (2 .mu.g) were electrophoretically separated, stained 
with ethidium bromide (FIG. 4b) . transferred to nitrocellulose filter and hybridized 
to radioactive pZ 183 probe (FIG. 4a) . Blots were exposed to X-ray film for 7 days at 
-70. degree. C. in the presence of an intensifying screen. It is clear from the data 
presented in FIG. 4 that this analysis yielded no positive signals from RNA of all the 
organs tested. This indicated that the relative contribution of pre-B cells expressing 
pZ 183 in all of these organs must be too low in the total mixture of all other cells 
to be detectable by Northern gel analysis. 

Detailed Description Text (80) : 

To obtain longer cDNA clones corresponding to the 1 . 2 kb transcript found in all pre-B 
cell lines, a new cDNA library was constructed. A poly A containing RNA preparation 
obtained from 70Z/3 cells by the tailing method (Maniatis, et al., supra, pp. 230-242) 
was inserted into the pUC-13 vector (Messing et al . , supra) . The new cDNA library was 



3 of 5 



2/3/03 6:32 PM 



Record Display Form 



wysiwyg://48/http://westb^ 



screened with the 380 nucleotide pair long insert of the cDNA clone pZ 183. Nine cDNA 
clones were found positive in a total of 50 000 clones. The cDNA clone with the 
longest i.e., a 0.7 kb insert called pZ 183-1, was sequenced by the dideoxy chain 
termination method (Sanger, et al . , supra). FIG. 7 shows the restriction enzyme sites 
used for generating the fragments which were used for cloning into M13 phage vector. 
The arrows indicate the length of the fragments generated and the direction in which 
the fragments were sequenced. 

Detailed Description Text (81) : 

The nucleotide sequence of the cDNA clone pZ 183-1 is shown in FIG. 8. The 5' to 3 ' 
orientation was deduced from the location of the poly A tail and the poly A attachment 
site (underlined). Nucleotide positions are numbered from the most 5' position of the 
insert cDNA fragment. 

Detailed Description Text (92) : 

From the above results it was clear that the clone pZ 183-1 did not correspond to the 
full-length 1.2 kb long transcript which is selectively expressed in pre-B cells. 
Therefore a cDNA library was constructed from poly A containing RNA of the uninduced 
murine pre-B lymphoma cell line 70Z/3 by the method described by Okayama et al . (Mol. 
Cell. Biol. 2, 161-170, [1982]). 5 . times . 10 . sup . 4 individual recombinant clones were 
screened with the radioactive insert of pZ 183-1 as described above. 

Detailed Description Text (108) : 

The 5' end of mature . lambda .. sub . 5 mRNA in pre-B cells was determined by primer 
extension. A synthetic oligonucleotide 5 1 -CAGAGTCTGTCCTACTCT-3 1 complementary to a 
part of Exon I was labeled with . gamma .-. sup . 32 P ATP using T4 polynucleotide kinase 
(Ingraham. et al . , Mol. Cell. Biol. 6, 2923-2931, [1986]). Poly A containing RNA of 
the murine pre-B cell line 40E-1 was purified as described above. 500 ng of labeled 
oligonucleotides were annealed to either 10 .mu.g of 40E-1 poly A containing RNA or 
yeast transfer RNA in 50 mM Tris/HCl, pH 7 . 5 , 75 mM KCl , 3 mM MgCl . Cloned Moloney 
murine leukemia virus reverse transcriptase (600 units) (Bethesda Research 
Laboratories) was added to each mixture and the reaction was carried out for 1 hour at 
3 7. degree. C. in the presence of 0 . 5 mM dATP, dGTP, dCTP and dTTP, 10 mM 
dithiothreitol, 1 mg/ml BSA, 1000 U/ml of ribonuclease inhibitor (Stehelin, Basle) and 
100 .mu.g/ml of actinomycin D. 

Detailed Description Text (111) : 

A human fetal liver cDNA library in the vector . lambda. gtll was obtained from Clontech 
Laboratories, Inc. (4055 Fabian Way, Palo Alto, Calif. 94303, USA--Catalog #HL 1005). 
2 . 5 . times . 10 . sup . 6 recombinant phage clones were plated onto agar plates (Maniatis, et 
al., pp. 68-73) containing 25 .mu.g/ml ampicillin and were transferred to 
nitrocellulose filters as described (Benton, et al . , supra). The filters were screened 
with a .sup. 32 P-labelled 700 nucleotide pair fragment generated by cutting the 
plasmid pZ 183-la with PvuII and Hindlll (=mouse probe) . Following digestion, the 
insert was separated by electrophoresis through a 1% low melting point agarose gel and 
radiolabeled with .sup. 32 P-dATP by the Klenow fragment reaction with random 
hexanucleotide primers from calf thymus DNA (Feinberg, et al . , supra). 

Prehybridization of the filters was done at 37. degree. C. in a solution containing 50% 
formamide, 5 . times . SSPE , 0.1% SDS, 10. times. Denhardt ' s solution. 100 .mu.g/ml salmon 
sperm DNA, 1 .mu.g/ml E.coli RNA, and 1 .mu.g/ml poly (A ) . Hybridization of the filters 
was done in the same conditions with the addition of 1 . times . 10 . sup . 6 cpm/ml of 
.sup. 32 P- labeled mouse probe. 

Detailed Description Text (115) : 

In a library of 10. sup. 6 once amplified cDNA clones constructed from 70Z/3 pre-B 
lymphoma poly A+ RNA around 100 positive clones were found. One out of seven strongly 
hybridizing clones found was selected because it appeared to have the longest insert. 
This clone, named pZ121 (FIG. 20) , contains a 780 base pair long pre-B specific insert 
including 20 base pairs of poly A. The clone pZ121 was deposited on Apr. 23, 1987 at 
the Deutsche Sammlung von Mikroorganismen (DSM) in the form of a sample of E. coli DHI 
(pZ121) , its accession number being DSM 4088. 

Detailed Description Text (152) : 

One of the important characteristics of the mouse V. sub. pre-B 1 gene is its restricted 
expression in mouse pre-B cell lines and, therefore, the pattern of expression of 
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human V.sub.pre-B in human lymphoid lines by Northern blot analysis of 
poly (A ) -selected RNA was examined. 

Detailed Description Text (153) : 

Total RNA was isolated from cytoplasm after lysis of cells in 5% citric acid 
containing 0.1% NP-40 as described by Schibler et al . (J. Mol . Biol, 142, 93-116 
[1980]) and further purified by oligo(dT) cellulose chromatography as described above. 
5 .mu.g of poly (A ) enriched RNA were electrophoresed through 1% agarose gels 
containing 18 mM Na . sub . 2 HPO.sub.4, 2 mM NaH.sub.2 P0.sub.4 and 6% formaldehyde. 
Separated RNA was then blotted onto diazotized phenylthioether (DPT) paper (Schleicher 
and Schuell) . 

Detailed Description Text (154) : 

Prehybridization of filters was done at 45. degree. C. in solutions containing 
S.times.SSPE ( 1 . times . SSPE=150 mM NaCl, 10 mM NaH.sub.2 P0.sub.4, 1 mM EDTA) , 
5 . times .Denhardt ' s , 2 mM glycine, 50% deionized formamide, 100 .mu.g/ml salmon sperm 
DNA, 20 .mu.g/ml yeast tRNA and 1 .mu.g/ml poly (A ) . Stringent hybridizations were done 
at 45. degree. C. in prehydridization solution lacking glycine but containing 10% 
dextran sulfate and 3 . times . 10 . sup . 6 cpm/ml .sup. 32 P- labelled probe. Cross species 
hybridizations were done at 3 7. degree. C. in hybridization solution containing only 
30% formamide. Stringent washes were done at 65. degree. C. in 0 . 2 . times . SSC , 0.1% SDS . 
Cross species hybridization experiments were washed finally in 0.2. times . SSC, 0.1% SDS 
at 3 7. degree. C. 

Detailed Description Text (155) : 

Human V.sub.preB is expressed only in pre-B cell lines 207, 697 (Findley et al . , 
supra), Nalm-6 (Hurwitz et al . , supra) but not in the cell lines LBW-4, Ra j i and 
Jurkat (FIG. 29). The human V.sub.preB poly (A ) . sup . + mRNA is 0.85 kb in size, as is 
the mRNA of its mouse homologue, V.sub.preB 1. Under low stringency conditions the 
mouse V.sub.preB 1 gene also hybridizes to 0.85 kb RNA of human pre-B cell lines (FIG. 
29) . Similar intensities of hybridization and similar sizes of the RNAs which 
hybridize with the mouse V.sub.preB 1 probe and the human probe indicate that the same 
RNA molecules may hybridize to both probes. The upper band in FIG. 2 9B corresponds to 
the size of 28S ribosomal RNA and may be the result of crosshybridization of the mouse 
V.sub.preB 1 probe to human ribosomal RNA at low stringency. The pattern of RNA 
expression of human V.sub.preB, so far, follows that of V.sub.preB 1 and .lambda. 5 in 
the mouse and indicates that human V.sub.preB is selectively expressed in human pre-B 
cell lines, but not in mature B cell or T cell lines. 
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L12 : Entry 1 of 33 



File: USPT 



Oct 8, 2002 



DOCUMENT- IDENTIFIER : US 6462185 Bl 

TITLE: Floral organ-specific gene and its promoter sequence 



Priority Application Year (1) : 
1996 

Detailed Description Text (9) : 

The nucleotide sequence represented by SEQ ID NO: 3 has the following characteristics 
among others. 1. It has 3 transcription initiation points at intervals of several 
nucleotides and these points are all A (adenine) following TC. Specifically, the 
transcription initiation points are the adenines (A) at positions 1122, 1125 and 1129. 

2. There is a TATA box-like sequence (5 ' -TATATAA-3 ' ) (Corden et al . Science 209, 
1406-1414, 1980) 30 bp upstream of the most upstream transcription initiation point. 

3. There are 2 ATG sequences in the same reading frame, each being located 77 bp and 
113 bp downstream of the most upstream transcription initiation point. 4. A 
termination codon (TGA) is located 21 bp upstream of the most upstream ATG (the first 
ATG). Moreover, there are two poly A signal-like sequences ( 5 ' -AATAAA-3 ' ) (Heidecker 
and Messing, Annu. Rev. Plant Physiol. 37, 439-466, 1986) in the terminator region. 
The term "terminator region" herein referrs to the region which is downstream of the 
termination codon. 

Detailed Description Text (100) : 

The product obtained by using 75FW1 and 175RV1 had an intron of 85 bp having a 
5 ' -GT -AG-3 1 sequence in the both ends thereof. Therefore, when the DNA of the genomic 
clone was employed as a template, a PCR product longer than that amplified by using 
cDNA as a template was amplified. This intron had a PstI site at the 3' -terminus. 

Detailed Description Text (102) : 

The total genomic DNA of rice was digested with a restriction enzyme EcoRI and genomic 
Southern analysis was carried out by using the RPC175 gene as a probe. Thus a band 
with a weak signal appeared at about 1.6 kb in addition to the one with a strong 
signal at about 2 . 6 kb (FIG. 3) . On the other hand, phage DNA was extracted from the 
above-mentioned 5 clones and digested with EcoRI followed by Southern hybridization 
with the use of RPC175 as a probe. As a result, it was found that the DNA fragments 
which formed hybridization with RPC175 were limited to those of 2.6 kb and 1.6 kb, 
which agreed with the results of the Southern analysis on the genomic DNA. It was 
known from the nucleotide sequence data, that RPC175 had the unique EcoRI site about 
70 bp upstream of the 3' -terminus. Therefore, the 1.6 kb fragment with a weak signal 
was considered to have been detected due to the homology between the short region 
(about 70 bp) from the EcoRI to the poly A sites in the 3' -region of RPC175 cDNA 
employed as a probe and the genomic DNA fragment. 

Detailed Description Text (115) : 

As a result, it was found that the whole nucleotide sequence of the RPG102 clone 
consisted of 2,636 bp and, when compared with the nucleotide sequence of the cDNA 
clone RPC175, two introns (85 bp and 199 bp) were contained in the region of the 
structural gene. The nucleotide sequences 5'GT and AG3 ' at both ends were conserved in 
both of these introns. The nucleotide sequences in the regions other than these 
introns of the genomic clone RPG102 coincided completely with the cDNA clone RPC175. A 
poly A signal-like sequence 5' -AATAAA-3' (Heidecker and Messing, Annu. Rev. 
PlantPhysiol . 37, 439-466, 1986) was located about 90 bp upstream of the EcoRI site in 
the 3' side and about 40 bp downstream of the translation termination codon TAG. 
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DOCUMENT- IDENTIFIER: US 5972353 A 

TITLE: MN proteins, polypeptides, fusion proteins and fusion polypeptides 



Priority Application Year (1) : 
1992 

Detailed Description Text (22) : 

Based upon results of . the RACE analysis, the full-length MN cDNA sequence was seen to 
contain a single ORF starting at position 12, with an ATG codon that is in a good 
context (GCGCATGG) with the rule proposed for translation initiation [Kozak, J. Cell. 
Biol., 108: 229-241 (1989)] . [See below under Mapping of MN Gene Transcription 
Initiation Site for fine mapping of the 5' end of the MN gene.] The AT rich 3' 
untranslated region contains a polyadenylation signal (AATAAA) preceding the end of 
the cDNA by 10 bp. Surprisingly, the sequence from the original clone as well as from 
four additional clones obtained from the same cDNA library did not reveal any poly (A ) 
tail. Moreover, as indicated above, just downstream of the poly (A) signal we found an 
ATTTA motif that is thought to contribute to mRNA instability (Shaw and Kamen, supra) . 
This fact raised the possibility that the poly (A ) tail is missing due to the specific 
degradation of the MN mRNA. 

Detailed Description Text (33) : 

Table 1 below lists the splice donor and acceptor sequences that conform to consensus 
splice sequences including the AG -GT motif [Mount, "A catalogue of splice junction 
sequences," Nucleic Acids Res. 10: 459-472 (1982)]. 

Detailed Description Text (41) : 

An RNase protection assay, as described above, was also used to verify also the 3' end 
of the MN cDNA . That was important with respect to our previous finding that the cDNA 
contains a poly (A ) signal but lacks a poly (A ) tail, which could be lost during the 
proposed degradation of MN mRNA due to the presence of an instability motif in its 3' 
untranslated region. RNP analysis of MN mRNA with the fragment of the genomic clone 
XE3 covering the region of interest corroborated our data from MN cDNA sequencing, 
since the 3 • end of the protected fragment corresponded to the last base of MN cDNA 
(position 10,752 of the genomic sequence). That site also meets the requirement for 
the presence of a second signal in the genomic sequence that is needed for 
transcription termination and polyadenylation [McLauchlan et al . , Nucleic Acids Res., 
13: 1347 (1985)]. Motif TGTGTTAGT (nt 10,759-10,767) corresponds well to both the 
consensus sequence and the position of that signal within 22 bp downstream from the 
polyA signal (nt 10,737-10,742). 

Detailed Description Paragraph Table (9) : 

# SEQUENCE 

LISTING - - - - (1) GENERAL INFORMATION: - - (iii) NUMBER OF SEQUENCES: 86 - - - - (2) 
INFORMATION FOR SEQ ID NO: 1: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 1522 base 
- #pairs (B) TYPE: nucleic acid (C) STRANDEDNESS : single (D) TOPOLOGY: linear - - (ii) 
MOLECULE TYPE: cDNA - - (iii) HYPOTHETICAL: NO - - (iv) ANTI-SENSE: NO - - (xi) 
SEQUENCE DESCRIPTION: SEQ ID NO: - #1: - - ACAGTCAGCC GCATGGCTCC CCTGTGCCCC AGCCCCTGGC 
TCCCTCTGTT GA - #TCCCGGCC 60 - - CCTGCTCCAG GCCTCACTGT GCAACTGCTG CTGTCACTGC 
TGCTTCTGAT GC - #CTGTCCAT 120 - - CCCCAGAGGT TGCCCCGGAT GCAGGAGGAT TCCCCCTTGG 
GAGGAGGCTC TT - #CTGGGGAA 18 0 - - GATGACCCAC TGGGCGAGGA GGATCTGCCC AGTGAAGAGG 
ATTCACCCAG AG - #AGGAGGAT 24 0 - - CCACCCGGAG AGGAGGATCT ACCTGGAGAG GAGGATCTAC 
CTGGAGAGGA GG - #ATCTACCT 3 00 - - GAAGTTAAGC CTAAATCAGA AGAAGAGGGC TCCCTGAAGT 
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TAGAGGATCT AC - #CTACTGTT 3 60 - - GAGGCTCCTG GAGATCCTCA AGAACCCCAG AATAATGCCC 
ACAGGGACAA AG - #AAGGGGAT 420 - - GACCAGAGTC ATTGGCGCTA TGGAGGCGAC CCGCCCTGGC 
CCCGGGTGTC CC - #CAGCCTGC 4 80 - - GCGGGCCGCT TCCAGTCCCC GGTGGATATC CGCCCCCAGC 
TCGCCGCCTT CT - #GCCCGGCC 540 - - CTGCGCCCCC TGGAACTCCT GGGCTTCCAG CTCCCGCCGC 
TCCCAGAACT GC - #GCCTGCGC 60 0 - - AACAATGGCC ACAGTGTGCA ACTGACCCTG CCTCCTGGGC 
TAGAGATGGC TC - #TGGGTCCC 66 0 - - GGGCGGGAGT ACCGGGCTCT GCAGCTGCAT CTGCACTGGG 
GGGCTGCAGG TC - #GTCCGGGC 72 0 - - TCGGAGCACA CTGTGGAAGG CCACCGTTTC CCTGCCGAGA 
TCCACGTGGT TC - #ACCTCAGC 780 - - ACCGCCTTTG CCAGAGTTGA CGAGGCCTTG GGGCGCCCGG 
GAGGCCTGGC CG - #TGTTGGCC 840 - - GCCTTTCTGG AGGAGGGCCC GGAAGAAAAC AGTGCCTATG 
AGCAGTTGCT GT - #CTCGCTTG 900 - - GAAGAAATCG CTGAGGAAGG CTCAGAGACT CAGGTCCCAG 
GACTGGACAT AT - #CTGCACTC 960 - - CTGCCCTCTG ACTTCAGCCG CTACTTCCAA TATGAGGGGT 
CTCTGACTAC AC - #CGCCCTGT 102 0 - - GCCCAGGGTG TCATCTGGAC TGTGTTTAAC CAGACAGTGA 
TGCTGAGTGC TA - #AGCAGCTC 1080 - - CACACCCTCT CTGACACCCT GTGGGGACCT GGTGACTCTC 
GGCTACAGCT GA - #ACTTCCGA 114 0 - - GCGACGCAGC CTTTGAATGG GCGAGTGATT GAGGCCTCCT 
TCCCTGCTGG AG - #TGGACAGC 12 00 - - AGTCCTCGGG CTGCTGAGCC AGTCCAGCTG AATTCCTGCC 
TGGCTGCTGG TG - #ACATCCTA 12 60 - - GCCCTGGTTT TTGGCCTCCT TTTTGCTGTC ACCAGCGTCG 
CGTTCCTTGT GC - #AGATGAGA 132 0 - - AGGCAGCACA GAAGGGGAAC CAAAGGGGGT GTGAGCTACC 
GCCCAGCAGA GG - #TAGCCGAG 13 8 0 - - ACTGGAGCCT AGAGGCTGGA TCTTGGAGAA TGTGAGAAGC 
CAGCCAGAGG CA - #TCTGAGGG 144 0 - - GGAGCCGGTA ACTGTCCTGT CCTGCTCATT ATGCCACTTC 
CTTTTAACTG CC - #AAGAAATT 1500 - - TTTTAAAATA AATATTTATA AT -#-#1522---- (2) 
INFORMATION FOR SEQ ID NO: 2: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 459 amino 
- #acids (B) TYPE: amino acid (C) STRANDEDNESS : (D) TOPOLOGY: linear - - (ii) MOLECULE 
TYPE: protein (A) DESCRIPTION: First - # 37 amino acids represent signal pe - #ptide, 
and remaining amino acids represent - #mature protein - - (xi) SEQUENCE DESCRIPTION: 
SEQ ID NO: - #2: - - Met Ala Pro Leu Cys Pro Ser Pro - # Trp Leu Pro Leu Leu lie Pro 
Ala -35 - # -30 - # -25 - - Pro Ala Pro Gly Leu Thr Val Gin - # Leu Leu Leu Ser Leu 
Leu Leu Leu -2 0-#-15-#-10-- Met Pro Val His Pro Gin Arg Leu - # Pro Arg Met 
Gin Glu Asp Ser Pro -5-#l-#5-#10-- Leu Gly Gly Gly Ser Ser Gly Glu - # Asp 
Asp Pro Leu Gly Glu Glu Asp 15 - # 20 - # 25 - - Leu Pro Ser Glu Glu Asp Ser Pro - # 
Arg Glu Glu Asp Pro Pro Gly Glu 30 -#35 -#40-- Glu Asp Leu Pro Gly Glu Glu Asp - 
# Leu Pro Gly Glu Glu Asp Leu Pro 45 - # 50 - # 55 - - Glu Val Lys Pro Lys Ser Glu Glu 



- # 


Glu 


Gly 


Ser 


Leu 


Lys 


Leu Glu 


Asp 


60 - 


- # ' 


65 - 


# 70 - # 75 - - Leu 


Pro 


Thr 


Val Glu 


Ala 


Pro 


Gly 


- # 


Asp 


Pro 


Gin Glu 


Pro 


Gin 


Asn 


Asn 


- # 80 ■ 


- # 85 - # 90 - 


- Ala His Arg 


Asp 


Lys 


Glu 


Gly 


Asp 


- # 


Asp Gin 


Ser 


His 


Trp 


Arg 


Tyr Gly 


95 - # 100 - 


- # : 


105 - 


- - Gly Asp 


Pro 


Pro 


Trp 


Pro 


Arg 


Val 


- # Ser 


Pro 


Ala 


Cys 


Ala 


Gly Arg 


Phe 110 - # 


115 


- # 


120 - - 


Gin 


Ser 


Pro 


Val 


Asp 


He 


Arg Pro 


- # 


Gin 


Leu 


Ala 


Ala Phe 


Cys Pro Ala 


125 


- # 


130 - # 


135 




Leu 


Arg 


Pro 


Leu 


Glu Leu 


Leu 


Gly 


- # 


Phe 


Gin Leu 


Pro Pro Leu 


Pro 


Glu 


140 - # 


145 


- # 


150 


- # 


155 




Leu Arg 


Leu 


Arg 


Asn 


Asn 


Gly His 


- # Ser Val 


Gin 


Leu 


Thr Leu 


Pro 


Pro 


- # 


160 


- # 


165 


- # 170 




Gly 


Leu 


Glu 


Met Ala 


Leu Gly Pro 


- # 


Gly Arg Glu 


Tyr 


Arg 


Ala 


Leu 


Gin 


175 


- # 180 


- # 


185 




Leu 


His Leu 


His Trp Gly Ala Ala 


- # Gly 


Arg 


Pro Gly 


Ser 


Glu 


His 


Thr 190 


- # 


195 


- # 


200 


- - Val 


Glu Gly His 


Arg 


Phe 


Pro Ala - 


# Glu He His Val Val His Leu Ser 205 - 


# 210 - 


# 215 - 


- Thr Ala Phe Ala Arg Val Asp 


Glu 


- # 


Ala 


Leu Gly Arg 


Pro Gly Gly 


Leu 


220 


- # 


225 - # 


230 - # 235 




Ala 


Val Leu 


Ala 


Ala 


Phe 


Leu 


Glu 


- # 


Glu Gly 


Pro 


Glu 


Glu 


Asn 


Ser Ala 


- # 240 - # 


245 


- # 


250 - - 


Tyr 


Glu 


Gin 


Leu 


Leu 


Ser 


Arg Leu 


- # 


Glu 


Glu 


He 


Ala Glu 


Glu Gly Ser 


255 


- # 


260 - # 


265 




Glu 


Thr 


Gin 


Val 


Pro Gly 


Leu 


Asp 


- # 


He 


Ser Ala 


Leu Leu Pro 


Ser 


Asp 


270 - # 


275 


- # 


280 




Phe 


Ser 


Arg Tyr 


Phe 


Gin 


Tyr 


Glu 


- # Gly 


Ser Leu Thr 


Thr 


Pro 


Pro Cys 


285 


- # 


290 


- # 


295 




Ala Gin 


Gly Val 


He 


Trp 


Thr Val 


- # Phe Asn 


Gin 


Thr 


Val Met 


Leu 


Ser 


300 


- # 


305 


- # 


310 - # 


315 




Ala 


Lys 


Gin Leu 


His Thr Leu 


Ser 


- # 


Asp Thr 


Leu 


Trp 


Gly 


Pro 


Gly Asp 


- # 320 


- # 


325 


- # 


330 


- - Ser 


Arg Leu Gin 


Leu 


Asn 


Phe Arg - 


# Ala Thr Gin Pro Leu Asn Gly Arg 3 35 - 


# 340 - 


# 345 - 


- Val He Glu Ala Ser Phe Pro 


Ala 


- # 


Gly Val 


Asp 


Ser 


Ser Pro 


Arg 


Ala 


350 


- # 


355 - # 


360 - - Ala 


Glu 


Pro 


Val Gin 


Leu 


Asn 


Ser 


- # 


Cys 


Leu 


Ala Ala 


Gly Asp 


He 


Leu 


365 - # 


370 - # 375 




Ala 


Leu Val 


Phe 


Gly 


Leu 


Leu 


Phe 


- # 


Ala Val 


Thr 


Ser 


Val 


Ala 


Phe Leu 


380 - # 385 


- # 


390 


- # 395 - 



- Val Gin Met Arg Arg Gin His Arg - # Arg Gly Thr Lys Gly Gly Val Ser - # 400 - # 405 

- # 410 - - Tyr Arg Pro Ala Glu Val Ala Glu - # Thr Gly Ala 415 - # 420 - - - - (2) 
INFORMATION FOR SEQ ID NO: 3: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 29 base - 
#pairs (B) TYPE : nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) 
MOLECULE TYPE: DNA (genomic) - - (iii) HYPOTHETICAL: NO - - (iv) ANTI-SENSE: YES - - 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #3 : - - CGCCCAGTGG GTCATCTTCC CCAGAAGAG - # - 
# 29 - - - - (2) INFORMATION FOR SEQ ID NO: 4: - - (i) SEQUENCE CHARACTERISTICS: (A) 
LENGTH: 19 base - #pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: 
linear - - (ii) MOLECULE TYPE: DNA (genomic) - - (iii) HYPOTHETICAL: NO - - (iv) 
ANTI-SENSE: YES - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #4 : - - GGAATCCTCC 
TGCATCCGG - # - # - # 19 - - - - (2) INFORMATION FOR SEQ ID NO: 5: - - (i) SEQUENCE 
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CHARACTERISTICS: (A) LENGTH: 10898 base - #pairs (B) TYPE: nucleic acid (C) 
STRANDEDNESS : single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA (genomic) - - 
(iii) HYPOTHETICAL: NO - - (iv) ANTI -SENSE: NO - - (xi) SEQUENCE DESCRIPTION: SEQ ID 
N0: _ #5. _ _ GGATCCTGTT GACTCGTGAC CTTACCCCCA ACCCTGTGCT CTCTGAAACA TG - #AGCTGTGT 60 
- - CCACTCAGGG TTAAATGGAT TAAGGGCGGT GCAAGATGTG CTTTGTTAAA CA - #GATGCTTG 120 - - 
AAGGCAGCAT GCTCGTTAAG AGTCATCACC AATCCCTAAT CTCAAGTAAT CA - #GGGACACA 18 0 - - 
AACACTGCGG AAGGCCGCAG GGTCCTCTGC CTAGGAAAAC CAGAGACCTT TG - #TTCACTTG 24 0 - - 
TTTATCTGAC CTTCCCTCCA CTATTGTCCA TGACCCTGCC AAATCCCCCT CT - #GTGAGAAA 300 - - 
CACCCAAGAA TTATCAATAA AAAAATAAAT TTAAAAAAAA AATACAAAAA AA - #AAAAAAAA 3 60 - - 
AAAAAAAAAA GACTTACGAA TAGTTATTGA TAAATGAATA GCTATTGGTA AA - #GCCAAGTA 42 0 - - 
AATGATCATA TTCAAAACCA GACGGCCATC ATCACAGCTC AAGTCTACCT GA - #TTTGATCT 4 80 - - 
CTTTATCATT GTCATTCTTT GGATTCACTA GATTAGTCAT CATCCTCAAA AT - #TCTCCCCC 54 0 - - 
AAGTTCTAAT TACGTTCCAA ACATTTAGGG GTTACATGAA GCTTGAACCT AC - #TACCTTCT 600 - - 
TTGCTTTTGA GCCATGAGTT GTAGGAATGA TGAGTTTACA CCTTACATGC TG - #GGGATTAA 660 - - 
TTTAAACTTT ACCTCTAAGT CAGTTGGGTA GCCTTTGGCT TATTTTTGTA GC - #TAATTTTG 720 - - 
TAGTTAATGG ATGCACTGTG AATCTTGCTA TGATAGTTTT CCTCCACACT TT - #GCCACTAG 780 - - 
GGGTAGGTAG GTACTCAGTT TTCAGTAATT GCTTACCTAA GACCCTAAGC CC - #TATTTCTC 840 - - 
TTGTACTGGC CTTTATCTGT AATATGGGCA TATTTAATAC AATATAATTT TT - #GGAGTTTT 90 0 - - 
TTTGTTTGTT TGTTTGTTTG TTTTTTTGAG ACGGAGTCTT GCATCTGTCA TG - #CCCAGGCT 96 0 - - 
GGAGTAGCAG TGGTGCCATC TCGGCTCACT GCAAGCTCCA CCTCCCGAGT TC - #ACGCCATT 102 0 - - 
TTCCTGCCTC AGCCTCCCGA GTAGCTGGGA CTACAGGCGC CCGCCACCAT GC - #CCGGCTAA 10 8 0 - - 
TTTTTTGTAT TTTTGGTAGA GACGGGGTTT CACCGTGTTA GCCAGAATGG TC - #TCGATCTC 114 0 - - 
CTGACTTCGT GATCCACCCG CCTCGGCCTC CCAAAGTTCT GGGATTACAG GT - #GTGAGCCA 12 0 0 - - 
CCGCACCTGG CCAATTTTTT GAGTCTTTTA AAGTAAAAAT ATGTCTTGTA AG - #CTGGTAAC 12 60 - - 
TATGGTACAT TTCCTTTTAT TAATGTGGTG CTGACGGTCA TATAGGTTCT TT - 



Detailed Description Paragraph Table (10) : 



#TGAGTTTG 132 0 - 
#TTGAAGAG 13 8 0 - 
#ACACAGTG 144 0 - 
#CAGTAATA 150 0 - 
#ACCTGAGG 1560 - 
#GACTGCGG 162 0 - 
#AGAGGTCT 16 8 0 - 
#GAATGTTT 174 0 - 
#AAAAGAGG 18 00 - 
#CAATTAAG 1860 - 
#TCTTGACA 192 0 - 
#TNGTTTTT 1980 - 
#GTGAGGCA 2 04 0 - 
#TCTTTATT 2100 - 
#ATTATATC 2160 - 
#GGTGGAAG 22 2 0 - 
#TCCCTCAA 22 8 0 - 
#TCAGGGCA 2 34 0 - 
#CTCTGTCA 24 00 - 
#TCGGCTCA 24 60 - 
#ATTACACC 2 52 0 - 
#GGCTGGTC 2 5 80 - 
#ACCGTGTC 2 64 0 - 
#ACTAAATA 2 7 00 - 
#TAACAAAG 2760 - 
#GGGAGAGT 2 82 0 - 
#AAGTCAGA 2 8 80 - 
#GAGCAGGA 2 94 0 - 
#CACATACA 3 0 00 - 
#CACCCTCG 3 0 60 - 
#GGATGTAT 312 0 - 
#TGCCTTTC 3180 - 
#GCAAGCAG 3 24 0 - 
#AAGCTAGT 3 3 00 - 
#CCCATCCT 3 3 60 - 
#GCTCCATC 34 2 0 - 
#CCCCCACC 34 80 - 
#CCCCACAG 3 54 0 - 



GCATGCATAT GCTACTTTTT GCAGTCCTTT CATTACATTT 
CATGTTATAT CTTTTAGCTT CACTTGGCTT AAAAGGTTCT 
TCATTGTTGG TACCACTTGG ATCATAAGTG GAAAAACAGT 
CTTGTTTGTA AGAGGGATGA TTCAGGTGAA TCTGACACTA 
TCTGAGATTC CTCTGACATT GCTGTATATA GGCTTTTCCT 
ACTATTTTTC TTAAGCAAGA TATGCTAAAG TTTTGTGAGC 
CATATCTGCA TCAAGTGAGA ACATATAATG TCTGCATGTT 
GCTTGTGTTT TATGCTTTTA TATAGACAGG GAAACTTGTT 
TGGGAATTGT TATTGGATAT CATCATTGGC CCACGCTTTC 
GGTTCATAAT CTCAATTCTG TCAGAATTGG TACAAGAAAT 
TTCCACTTGG TAGGAAATAA GAATGTGAAA CTCTTCAGTT 
TTGCAATTTC CTTCTTACTG TGTTAAAAAA AAGTATGATC 
TTCTTAATCA TGATCTTTAA AGATCAATAA TATAATCCTT 
ATAATAAAGA TAATTTGTCT TTAACAGAAT CAATAATATA 
TTTGCTGGGC GCAGTGGCTC ACACCTGTAA TCCCAGCACT 
GATCAAATTT GCCTACTTCT ATATTATCTT CTAAAGCAGA 
TATGATGATA TTGACAGGGT TTGCCCTCAC TCACTAGATT 
GGTAGCGTTT TTTGTTTTTG TTTTTGTTTT TCTTTTTTGA 
CCCAGGCCAG AGTGCAATGG TACAGTCTCA GCTCACTGCA 
AACCATCATC CCATTTCAGC CTCCTGAGTA GCTGGGACTA 
TGGCTAATTT TTTTGTATTT CTAGTAGAGA CAGGGTTTGG 
TCGAACTCCT GGACTCAAGC AATCCACCCA CCTCAGCCTC 
TTATTCATTT CCATGTCCCT AGTCCATAGC CCAGTGCTGG 
AATATTTGTT GAATGCAATA GTAAATAGCA TTTCAGGGAG 
GTGGTAAAAG GTTTGGAGAA AAAAATAATA GTTTAATTTG 
AGTAGGAGAC AAGATGGAAA GGTCTCTTGG GCAAGGTTTT 
AGTACACAAT GTGCATATCG TGGCAGGCAG TGGGGAGCCA 
GAGTAATGTG TTGAAAAATA AATATAGGTT AAACCTATCA 
CTTGCTTTTC ATTCAAGCTC AAGTTTGTCT CCCACATACC 
GGCTCCCCTA GCAGCCTGCC CTACCTCTTT ACCTGCTTCC 
ACATGAGCTG CTTTCCCTCT CAGCCAGAGG ACATGGGGGG 
CCCTTCTGTG CCTGGAGCTG GGAAGCAGGC CAGGGTTAGC 
CTGGGTGGTG CCAGGGAGAG CCTGCATAGT GCCAGGTGGT 
CCATGGCCCC GATAACCTTC TGCCTGTGCA CACACCTGCC 
AGCTTTGGTA TGGGGGAGAG GGCACAGGGC CAGACAAACC 
TCTGCAAAAG GGCGCTCTGT GAGTCAGCCT GCTCCCCTCC 
CAGCTCTCGT TTCCAATGCA CGTACAGCCC GTACACACCG 
TCAGCCGCAT GGCTCCCCTG TGCCCCAGCC CCTGGCTCCC 



TTCTCTCTTC AT 
CTCATTAGCC TA 
CAAGAAATTG CA 
AGAAACTCCC CT 
TTGACAGCCT GT 
CTTTTTCCAG AG 
TCCATATTTC AG 
CCTCAGTGAC CC 
TGACCTTGGA AA 
AGCTGCTATG TT 
GGTGTGTGTC CC 
TTGCTCTGAG AG 
TCAAGGATTA TG 
ATCCCTTAAA GG 
TTGGGTGGCC AA 
ATTCATCTCT CT 
GTGAGCTCCT GC 
GACAGGGTCT TG 
GCCTCAACCG CC 
CAGGCACATG CC 
CCATGTTGCC CG 
C C AAAATG AG GG 
ACCTATGGTA GT 
CAAGAACTAG AT 
GCTAGAGTAT GA 
GAAGGAAGTT GG 
ATGAAGGCTT TT 
GAGCCCCTCT GA 
CATTACTTAA CT 
TGGTGGAGTC AG 
CCCCAGCTCC CC 
TGAGGCTGGC TG 
GCCTTGGGTT CC 
CCTCACTCCA CC 
TGTGAGACTT TG 
AGGCTTGCTC CT 
TGTGCTGGGA CA 
TCTGTTGATC CC 
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#GGCCCCTG 3 6 00 - - CTCCAGGCCT CACTGTGCAA CTGCTGCTGT CACTGCTGCT TCTGGTGCCT GT - 

#CCATCCCC 3 660 - - AGAGGTTGCC CCGGATGCAG GAGGATTCCC CCTTGGGAGG AGGCTCTTCT GG - 

#GGAAGATG 3 72 0 - - ACCCACTGGG CGAGGAGGAT CTGCCCAGTG AAGAGGATTC ACCCAGAGAG GA - 

#GGATCCAC 3 7 80 - - CCGGAGAGGA GGATCTACCT GGAGAGGAGG ATCTACCTGG AGAGGAGGAT CT - 

#ACCTGAAG 3 84 0 - - TTAAGCCTAA ATCAGAAGAA GAGGGCTCCC TGAAGTTAGA GGATCTACCT AC - 

#TGTTGAGG 3 900 - - CTCCTGGAGA TCCTCAAGAA CCCCAGAATA ATGCCCACAG GGACAAAGAA GG - 

#TAAGTGGT 3 960 - - CATCAATCTC CAAATCCAGG TTCCAGGAGG TTCATGACTC CCCTCCCATA CC - 

#CCAGCCTA 4 02 0 - - GGCTCTGTTC ACTCAGGGAA GGAGGGGAGA CTGTACTCCC CACAGAAGCC CT - 

#TCCAGAGG 4 0 80 - - TCCCATACCA ATATCCCCAT CCCCACTCTC GGAGGTAGAA AGGGACAGAT GT - 

#GGAGAGAA 414 0 - - AATAAAAAGG GTGCAAAAGG AGAGAGGTGA GCTGGATGAG ATGGGAGAGA AG - 

#GGGGAGGC 42 00 - - TGGAGAAGAG AAAGGGATGA GAACTGCAGA TGAGAGAAAA AATGTGCAGA CA - 

#GAGGAAAA 42 60 - - AAATAGGTGG AGAAGGAGAG TCAGAGAGTT TGAGGGGAAG AGAAAAGGAA AG - 

#CTTGGGAG 43 2 0 - - GTGAAGTGGG TACCAGAGAC AAGCAAGAAG AGCTGGTAGA AGTCATCTCA TC - 

#TTAGGCTA 43 80 - - CAATGAGGAA TTGAGACCTA GGAAGAAGGG ACACAGCAGG TAGAGAAACG TG - 

#GCTTCTTG 444 0 - - ACTCCCAAGC CAGGAATTTG GGGAAAGGGG TTGGAGACCA TACAAGGCAG AG - 

tfGGATGAGT 45 00 - - GGGGAGAAGA AAGAAGGGAG AAAGGAAAGA TGGTGTACTC ACTCATTTGG GA - 

#CTCAGGAC 4560 - - TGAAGTGCCC ACTCACTTTT TTTTTTTTTT TTTTTGAGAC AAACTTTCAC TT - 

#TTGTTGCC 462 0 - - CAGGCTGGAG TGCAATGGCG CGATCTCGGC TCACTGCAAC CTCCACCTCC CG - 

#GGTTCAAG 4680 - - TGATTCTCCT GCCTCAGCCT CTAGCCAAGT AGCTGCGATT ACAGGCATGC GC - 

#CACCACGC 474 0 - - CCGGCTAATT TTTGTATTTT TAGTAGAGAC GGGGTTTCGC CATGTTGGTC AG - 

#GCTGGTCT 4 8 00 - - CGAACTCCTG ATCTCAGGTG ATCCAACCAC CCTGGCCTCC CAAAGTGCTG GG - 

#ATTATAGG 4 8 60 - - CGTGAGCCAC AGCGCCTGGC CTGAAGCAGC CACTCACTTT TACAGACCCT AA - 

#GACAATGA 4 92 0 - - TTGCAAGCTG GTAGGATTGC TGTTTGGCCC ACCCAGCTGC GGTGTTGAGT TT - 

#GGGTGCGG 4 980 - - TCTCCTGTGC TTTGCACCTG GCCCGCTTAA GGCATTTGTT ACCCGTAATG CT - 

#CCTGTAAG 5 04 0 - - GCATCTGCGT TTGTGACATC GTTTTGGTCG CCAGGAAGGG ATTGGGGCTC TA - 

#AGCTTGAG 510 0 - - CGGTTCATCC TTTTCATTTA TACAGGGGAT GACCAGAGTC ATTGGCGCTA TG - 

#GAGGTGAG 516 0 - - ACACCCACCC GCTGCACAGA CCCAATCTGG GAACCCAGCT CTGTGGATCT CC - 

#CCTACAGC 522 0 - - CGTCCCTGAA CACTGGTCCC GGGCGTCCCA CCCGCCGCCC ACCGTCCCAC CC - 

#CCTCACCT 52 8 0 - - TTTCTACCCG GGTTCCCTAA GTTCCTGACC TAGGCGTCAG ACTTCCTCAC TA - 

#TACTCTCC 534 0 - - CACCCCAGGC GACCCGCCCT GGCCCCGGGT GTCCCCAGCC TGCGCGGGCC GC - 

#TTCCAGTC 54 0 0 - - CCCGGTGGAT ATCCGCCCCC AGCTCGCCGC CTTCTGCCCG GCCCTGCGCC CC - 

#CTGGAACT 54 60 - - CCTGGGCTTC CAGCTCCCGC CGCTCCCAGA ACTGCGCCTG CGCAACAATG GC - 

#CACAGTGG 552 0 - - TGAGGGGGTC TCCCCGCCGA GACTTGGGGA TGGGGCGGGG CGCAGGGAAG GG - 

#AACCGTCG 55 8 0 - - CGCAGTGCCT GCCCGGGGGT TGGGCTGGCC CTACCGGGCG GGGCCGGCTC AC - 

#TTGCCTCT 564 0 - - CCCTACGCAG TGCAACTGAC CCTGCCTCCT GGGC TAG AG A TGGCTCTGGG TC - 

#CCGGGCGG 57 00 - - GAGTACCGGG CTCTGCAGCT GCATCTGCAC TGGGGGGCTG CAGGTCGTCC GG - 

#GCTCGGAG 5760 - - CACACTGTGG AAGGCCACCG TTTCCCTGCC GAGGTGAGCG CGGACTGGCC GA - 

#GAAGGGGC 582 0 - - AAAGGAGCGG GGCGGACGGG GGCCAGAGAC GTGGCCCTCT CCTACCCTCG TG - 

#TCCTTTTC 5880 - - AGATCCACGT GGTTCACCTC AGCACCGCCT TTGCCAGAGT TGACGAGGCC TT - 

#GGGGCGCC 5 94 0 - - CGGGAGGCCT GGCCGTGTTG GCCGCCTTTC TGGAGGTACC AGATCCTGGA CA - 

#CCCCCTAC 60 00 - - TCCCCGCTTT CCCATCCCAT GCTCCTCCCG GACTCTATCG TGGAGCCAGA GA - 

#CCCCATCC 6060 - - CAGCAAGCTC ACTCAGGCCC CTGGCTGACA AACTCATTCA CGCACTGTTT GT - 

#TCATTTAA 612 0 - - CACCCACTGT GAACCAGGCA CCAGCCCCCA ACAAGGATTC TGAAGCTGTA GG - 

#TCCTTGCC 6180 - - TCTAAGGAGC CCACAGCCAG TGGGGGAGGC TGACATGACA GACACATAGG AA - 

#GGACATAG 624 0 - - TAAAGATGGT GGTCACAGAG GAGGTGACAC TTAAAGCCTT CACTGGTAGA AA - 

#AGAAAAGG 63 00 - - AGGTGTTCAT TGCAGAGGAA ACAGAATGTG CAAAGACTCA GAATATGGCC TA - 

#TTTAGGGA 63 60 - - ATGGCTACAT ACACCATGAT TAGAGGAGGC CCAGTAAAGG GAAGGGATGG TG - 

#AGATGCCT 642 0 - - GCTAGGTTCA CTCACTCACT TTTATTTATT TATTTATTTT TTTGACAGTC TC - 

#TCTGTCGC 64 80 - - CCAGGCTGGA GTGCAGTGGT GTGATCTTGG GTCACTGCAA CTTCCGCCTC CC - 

#GGGTTCAA 654 0 - - GGGATTCTCC TGCCTCAGCT TCCTGAGTAG CTGGGGTTAC AGGTGTGTGC CA - 

#CCATGCCC 6600 - - AGCTAATTTT TTTTTGTATT TTTAGTAGAC AGGGTTTCAC CATGTTGGTC AG - 

#GCTGGTCT 6660 - - CAAACTCCTG GCCTCAAGTG ATCCGCCTGA CTCAGCCTAC CAAAGTGCTG AT - 

#TACAAGTG 672 0 - - TGAGCCACCG TGCCCAGCCA CACTCACTGA TTCTTTAATG CCAGCCACAC AG - 

#CACAAAGT 67 80 - - T C AG AG AAAT GCCTCCATCA TAGCATGTCA ATATGTTCAT ACTCTTAGGT TC - 

#ATGATGTT 684 0 - - CTTAACATTA GGTTCATAAG CAAAATAAGA AAAAAGAATA ATAAATAAAA GA - 

#AGTGGCAT 6900 - - GTCAGGACCT CACCTGAAAA GCCAAACACA GAATCATGAA GGTGAATGCA GA - 

#GGTGACAC 6 960 - - CAACACAAAG GTGTATATAT GGTTTCCTGT GGGGAGTATG TACGGAGGCA GC - 

#AGTGAGTG 7 02 0 - - AGACTGCAAA CGTCAGAAGG GCACGGGTCA CTGAGAGCCT AGTATCCTAG TA - 

#AAGTGGGC 7 080 - - TCTCTCCCTC TCTCTCCAGC TTGTCATTGA AAACCAGTCC ACCAAGCTTG TT - 

#GGTTCGCA 714 0 - - CAGCAAGAGT ACATAGAGTT TGAAATAATA CATAGGATTT TAAGAGGGAG AC - 

#ACTGTCTC 72 00 - - TAAAAAAAAA AACAACAGCA ACAACAAAAA GCAACAACCA TTACAATTTT AT - 

#GTTCCCTC 72 60 - - AGCATTCTCA GAG C TGAGGA ATGGGAGAGG ACTATGGGAA CCCCCTTCAT GT - 

#TCCGGCCT 73 2 0 - - TCAGCCATGG CCCTGGATAC ATGCACTCAT CTGTCTTACA ATGTCATTCC CC - 

#CAGGAGGG 73 80 - - CCCGGAAGAA AACAGTGCCT ATGAGCAGTT GCTGTCTCGC TTGGAAGAAA TC - 

#GCTGAGGA 744 0 - - AGGTCAGTTT GTTGGTCTGG CCACTAATCT CTGTGGCCTA GTTCATAAAG AA - 



4 of 13 



2/3/03 6:51 PM 



Record Display Form 



http://westbrs:8002/bin/ga^^ 



#TCACCCTT 
#TGAAGCAT 
#GCAAACGG 
#CTGTAATC 
#CCATCCTG 
#GTGGTGGC 
#ACCCGGGA 
#CAGAGCGA 
#TGAGACAA 
#GAGCTAAA 
#AAATACTT 
#TTACTCTA 
#TTGTGTCT 
#TTTTCTTT 
#CCATATTG 
#AAAGTGCT 
#ACTTTATG 
#TCTTCCTT 



7500 
7560 
7620 
7680 
7740 
7800 
7860 
7920 
7980 
8040 
8100 
8160 
8220 
8280 
8340 
8400 
8460 
8520 



TGGAGCTTCA 
GAGCCAGCGC 
CTGCCTACAG 
CCAGCACTTT 
GCCAACATGG 
GGGTGCCTGT 
GGCAGAAGTT 
GACTCTTGTC 
AAAAAACAAG 
CTTTTTCTGA 
TTGTTGGAAA 
CTAGACCTTT 
GTTTTGTATA 

GCCAGGCTGC 
GGGATTCATT 
ATGGTACACA 
CCTCCCTTCC 



GGTCTGAGGC 
TCATCTTGAT 
ATTGAAAACC 
GGGAGGCCAA 
TGAAACCCCA 
AATCCCAGCT 
GCAGTGAGCC 
TCAAAAAAAA 
ACCAAAAAAT 
GAACTGTTTA 
TCGTTCTCTT 
TAGGTTTCTG 
GTTATCAATA 
TTTT'X^TTTTT 
TCTCAAACTC 
TTTTCTTTTT 
GAGTTAAGAG 
CTCCCACCTT 



TGGAGATGGG 
AATAACCATG 
AAGCAAAAAC 
GGCAGGTGGA 
TCTCTACTAA 
ACTCGGGAGG 
GAGATCGTGC 
AAAAAAAAAA 
GGTGTTTGGA 
TCTTTAATAA 
CTTAGTCACT 
CTAGACTAGG 
TTCATATTTA 
TTTTTTACAT 
CTGACCTTGT 
AATTTGCTCT 
TGTAGACTCA 
CCCTTCTCTC 



CTCCCTCCAG 
AAGCTGACAG 
CGCCGGGCAC 
TCACGAGGTC 
AAATACGAAA 
CTGAGGCAGG 
CACTGCACTC 
GAAAACCAAG 
AATTGTCAAG 
GCATCAAATA 
CTTGGGTCAT 
TAGAACTCTG 
TTTACAAGTT 
CTTTAGTAGA 
GATCCACCAG 
GGGCTTAAAC 
GACGGTCTTT 
CTTCCTTTCT 



TGCAGGAGGG AT - 
ACACAGTTAC CC - 
GGTGGCTCAC GC - 
AAGAGATCAA GA - 
AAATAGCCAG GC - 
AGAATGGCAT GA - 
CAGCCTGGGC AA - 
CAAAAACCAA AA - 
GTCAAGTCTG GA - 
TTTTAACTTT GT - 
TTTAAATCTC AC - 
CCTTTGCATT TC - 
ATTCAGATCA TT - 
GACAGGGTTT CA - 
CCTCGGCCTC CC - 
TTGTGGCCCA GC - 
CTTCTTTCCT TC - 
TTCTTCCTCT CT - 



Detailed Description Paragraph Table (11) : 

#TGCTTCCT 858 0 - - CAGGCCTCTT CCAGTTGCTC CAAAGCCCTG TACTTTTTTT TGAGTTAACG TC - 
#TTATGGGA 864 0 - - AGGGCCTGCA CTTAGTGAAG AAGTGGTCTC AGAGTTGAGT TACCTTGGCT TC - 
#TGGGAGGT 8700 - - GAAACTGTAT CCCTATACCC TGAAGCTTTA AGGGGGTGCA ATGTAGATGA GA - 
#CCCCAACA 8760 - - TAGATCCTCT TCACAGGCTC AGAGACTCAG GTCCCAGGAC TGGACATATC TG - 
#CACTCCTG 8 82 0 - - CCCTCTGACT TCAGCCGCTA CTTCCAATAT GAGGGGTCTC TGACTACACC GC - 
#CCTGTGCC 8 8 80 - - CAGGGTGTCA TCTGGACTGT GTTTAACCAG ACAGTGATGC TGAGTGCTAA GC - 
#AGGTGGGC 8 94 0 - - CTGGGGTGTG TGTGGACACA GTGGGTGCGG GGGAAAGAGG ATGTAAGATG AG - 
#ATGAGAAA 900 0 - - CAGGAGAAGA AAGAAATCAA GGCTGGGCTC TGTGGCTTAC GCCTATAATC CC - 
#ACCACGTT 9060 - - GGGAGGCTGA GGTGGGAGAA TGGTTTGAGC CCAGGAGTTC AAGACAAGGC GG - 
#GGCAACAT 912 0 - - AGTGTGACCC CATCTCTACC AAAAAAACCC CAACAAAACC AAAAATAGCC GG - 
#GCATGGTG 918 0 - - GTATGCGGCC TAGTCCCAGC TACTCAAGGA GGCTGAGGTG GGAAGATCGC TT - 
#GATTCCAG 924 0 - - GAGTTTGAGA CTGCAGTGAG CTATGATCCC ACCACTGCCT ACCATCTTTA GG - 
#ATACATTT 93 0 0 - - ATTTATTTAT AAAAGAAATC AAGAGGCTGG ATGGGGAATA CAGGAGCTGG AG - 
#GGTGGAGC 93 6 0 - - CCTGAGGTGC TGGTTGTGAG CTGGCCTGGG ACCCTTGTTT CCTGTCATGC CA - 
#TGAACCCA 942 0 - - CCCACACTGT CCACTGACCT CCCTAGCTCC ACACCCTCTC TGACACCCTG TG - 
#GGGACCTG 94 80 - - GTGACTCTCG GCTACAGCTG AACTTCCGAG CGACGCAGCC TTTGAATGGG CG - 
#AGTGATTG 954 0 - - AGGCCTCCTT CCCTGCTGGA GTGGACAGCA GTCCTCGGGC TGCTGAGCCA GG - 
#TACAGCTT 9600 - - TGTCTGGTTT CCCCCCAGCC AGTAGTCCCT TATCCTCCCA TGTGTGTGCC AG - 
#TGTCTGTC 9660 - - ATTGGTGGTC ACAGCCCGCC TCTCACATCT CCTTTTTCTC TCCAGTCCAG CT - 
#GAATTCCT 972 0 - - GCCTGGCTGC TGGTGAGTCT GCCCCTCCTC TTGGTCCTGA TGCCAGGAGA CT - 
#CCTCAGCA 9780 - - CCATTCAGCC CCAGGGCTGC TCAGGACCGC CTCTGCTCCC TCTCCTTTTC TG - 
#CAGAACAG 984 0 - - ACCCCAACCC CAATATTAGA GAGGCAGATC ATGGTGGGGA TTCCCCCATT GT - 
#CCCCAGAG 990 0 - - GCTAATTGAT TAGAATGAAG CTTGAGAAAT CTCCCAGCAT CCCTCTCGCA AA - 
#AGAATCCC 9960 - - CCCCCCTTTT TTTAAAGATA GGGTCTCACT CTGTTTGCCC CAGGCTGGGG TG - 
#TTGTGGCA 1002 0 - - CGATCATAGC TCACTGCAGC CTCGAACTCC TAGGCTCAGG CAATCCTTTC AC - 
#CTTAGCTT 100 80 - - CTCAAAGCAC TGGGACTGTA GGCATGAGCC ACTGTGCCTG GCCCCAAACG GC - 
#CCTTTTAC 1014 0 - - TTGGCTTTTA GGAAGCAAAA ACGGTGCTTA TCTTACCCCT TCTCGTGTAT CC - 
#ACCCTCAT 10200 - - CCCTTGGCTG GCCTCTTCTG GAGACTGAGG CACTATGGGG CTGCCTGAGA AC - 
#TCGGGGCA 10260 - - GGGGTGGTGG AGTGCACTGA GGCAGGTGTT GAGGAACTCT GCAGACCCCT CT - 
#TCCTTCCC 1032 0 - - AAAGCAGCCC TCTCTGCTCT CCATCGCAGG TGACATCCTA GCCCTGGTTT TT - 
#GGCCTCCT 10380 - - TTTTGCTGTC ACCAGCGTCG CGTTCCTTGT GCAGATGAGA AGGCAGCACA GG - 
#TATTACAC 1044 0 - - TGACCCTTTC TTCAGGCACA AGCTTCCCCC ACCCTTGTGG AGTCACTTCA TG - 
#CAAAGCGC 105 00 - - ATGCAAATGA GCTGCTCCTG GGCCAGTTTT CTGATTAGCC TTTCCTGTTG TG - 
#T AC AC AC A 10560 - - GAAGGGGAAC CAAAGGGGGT GTGAGCTACC GCCCAGCAGA GGTAGCCGAG AC - 
#TGGAGCCT 1062 0 - - AGAGGCTGGA TCTTGGAGAA TGTGAGAAGC CAGCCAGAGG CATCTGAGGG GG - 
#AGCCGGTA 106 80 - - ACTGTCCTGT CCTGCTCATT ATGCCACTTC CTTTTAACTG CCAAGAAATT TT - 
#TTAAAATA 10740 - - AATATTTATA ATAAAATATG TGTTAGTCAC CTTTGTTCCC CAAATCAGAA GG - 
#AGGTATTT 10800 - - GAATTTCCTA TTACTGTTAT TAGCACCAAT TTAGTGGTAA TGCATTTATT CT - 
#ATTACAGT 10860 - - TCGGCCTCCT TCCACACATC ACTCCAATGT GTTGCTCC -#-#10898---- 
(2) INFORMATION FOR SEQ ID NO: 6: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 37 
amino - #acids (B) TYPE: amino acid (C) STRANDEDNESS : (D) TOPOLOGY: linear - - (ii) 
MOLECULE TYPE: peptide (A) DESCRIPTION: Signa - #1 peptide - - (xi) SEQUENCE 
DESCRIPTION: SEQ ID NO: - #6: - - Met Ala Pro Leu Cys Pro Ser Pro - # Trp Leu Pro Leu 
Leu lie Pro Ala l-#5-#10-#15-- Pro Ala Pro Gly Leu Thr Val Gin - # Leu Leu 
Leu Ser Leu Leu Leu Leu 20 - # 25 - # 30 - - Met Pro Val His Pro 35 - - - - (2) 
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INFORMATION FOR SEQ ID NO: 7: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 25 base - 
#pairs (B) TYPE: nucleic acid (C) STRANDEDNESS : single (D) TOPOLOGY: linear - - (ii) 
MOLECULE TYPE: other nucleic acid (A) DESCRIPTION: / - #desc = "primer" - - (iii) 
HYPOTHETICAL: NO - - (iv) ANTI -SENSE: YES - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO : - 
#7: - - TGGGGTTCTT GAGGATCTCC AGGAG -#-#25-- --(2) INFORMATION FOR SEQ ID NO: 
8: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 26 base - #pairs (B) TYPE: nucleic 
acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: other 
nucleic acid (A) DESCRIPTION: / - #desc = "primer" - - (iii) HYPOTHETICAL: NO - - (xi) 
SEQUENCE DESCRIPTION: SEQ ID NO: - #8: - - CTCTAACTTC AGGGAGCCCT CTTCTT - # - # 26 - - 

- - (2) INFORMATION FOR SEQ ID NO : 9: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 4 8 
base - #pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - 
(ii) MOLECULE TYPE: other nucleic acid (A) DESCRIPTION: / - #desc = "primer" - - (iii) 
HYPOTHETICAL: NO - - (ix) FEATURE: N stands for inosine - - (xi) SEQUENCE DESCRIPTION: 
SEQ ID NO: - #9 : - - CUACUACUAC UAGGCCACGC GTCGACTAGT ACGGGNNGGG NNGGGNNG - # 48 - - - 

- (2) INFORMATION FOR SEQ ID NO: 10: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 6 
amino - #acids (B) TYPE: amino acid (C) STRANDEDNESS: (D) TOPOLOGY: linear - - (ii) 
MOLECULE TYPE: peptide - - (v) FRAGMENT TYPE: internal - - (xi) SEQUENCE DESCRIPTION: 
SEQ ID NO: - #10: - - Glu Glu Asp Leu Pro Ser 1 5 - - - - (2) INFORMATION FOR SEQ ID 
NO: 11: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 6 amino - #acids . (B) TYPE: amino 
acid (C) STRANDEDNESS: (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: peptide - - (v) 
FRAGMENT TYPE: internal - - (ix) FEATURE: (A) NAME/KEY: Peptide (B) LOCATION : 55 .. 60 - 

- (xi) SEQUENCE DESCRIPTION: SEQ ID NO : - #11: - - Gly Glu Asp Asp Pro Leu 15---- 
(2) INFORMATION FOR SEQ ID NO : 12: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 21 
amino - #acids (B) TYPE: amino acid (C) STRANDEDNESS: (D) TOPOLOGY: linear - - (ii) 
MOLECULE TYPE: peptide - - (v) FRAGMENT TYPE: internal - - (xi) SEQUENCE DESCRIPTION: 
SEQ ID NO: - #12: - - Asn Asn Ala His Arg Asp Lys Glu - # Gly Asp Asp Gin Ser His Trp 
Arg 1~#5-#10-#15-- Tyr Gly Gly Asp Pro 20 - - - - (2) INFORMATION FOR SEQ 
ID NO: 13: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 16 amino - #acids (B) TYPE: 
amino acid (C) STRANDEDNESS: (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: peptide - - 
(v) FRAGMENT TYPE: internal - - (ix) FEATURE: (A) NAME/KEY: Peptide (B) 

LOCATION: 36. .51 - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #13: - - His Pro Gin Arg 
Leu Pro Arg Met Gin Glu As - #p Ser Pro Leu Gly Gly 1 5 - # 10 - # 15 - - - - (2) 
INFORMATION FOR SEQ ID NO: 14 : - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 24 amino 

- #acids (B) TYPE: amino acid (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: peptide - - 
(v) FRAGMENT TYPE: internal - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #14: - - Glu 
Glu Asp Ser Pro Arg Glu Glu - # Asp Pro Pro Gly Glu Glu Asp Leu l-#5-#10-#15 

- - Pro Gly Glu Glu Asp Leu Pro Gly 20 - - - - (2) INFORMATION FOR SEQ ID NO: 15 : - - 
(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 13 amino - #acids (B) TYPE: amino acid (C) 
STRANDEDNESS: (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: peptide - - (v) FRAGMENT 
TYPE: internal - - (ix) FEATURE: (A) NAME/KEY: Peptide (B) LOCATION : 2 7 9 2 91 - - (xi) 
SEQUENCE DESCRIPTION: SEQ ID NO : - #15: - - Leu Glu Glu Gly Pro Glu Glu Asn Ser Ala Ty 

- #r Glu Gin 1 5 - # 10 - - - - (2) INFORMATION FOR SEQ ID NO: 16 : - - (i) SEQUENCE 
CHARACTERISTICS: (A) LENGTH: 16 amino - #acids (B) TYPE: amino acid (C) STRANDEDNESS: 
(D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: peptide - - (v) FRAGMENT TYPE: internal - 

- (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #16: - - Met Arg Arg Gin His Arg Arg Gly Thr 
Lys Gl - #y Gly Val Ser Tyr Arg 1 5 - # 10 - # 15 - - - - (2) INFORMATION FOR SEQ ID 
NO: 17: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 45 base - #pairs (B) TYPE: 
nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA 

(genomic) - - (xi) SEQUENCE DESCRIPTION : SEQ ID NO: - #17: - - GTCGCTAGCT CCATGGGTCA 
TATGCAGAGG TTGCCCCGGA TGCAG - # - #45 - - - - (2) INFORMATION FOR SEQ ID NO : 18: - - 

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 43 base - #pairs (B) TYPE: nucleic acid (C) 
STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA (genomic) - - 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #18: - - GAAGATCTCT TACTCGAGCA TTCTCCAAGA 
TCCAGCCTCT AGG - # - # 43 - - - - (2) INFORMATION FOR SEQ ID NO: 19: 

Detailed Description Paragraph Table (13) : 

- - GCTCAGAGAC TCAGGTCCCA GGACTGGACA TATCTGCACT CCTGCCCTCT GA - #CTTCAGCC 6 0 - - 
GCTACTTCCA ATATGAGGGG TCTCTGACTA CACCGCCCTG TGCCCAGGGT GT - #CATCTGGA 120 - - 
CTGTGTTTAA CCAGACAGTG ATGCTGAGTG CTAAGCAG -#-#15 8---- (2) INFORMATION FOR SEQ 
ID NO: 35: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 145 base - #pairs (B) TYPE: 
nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA 
(genomic) (A) DESCRIPTION: 8th - #MN exon - - (iii) HYPOTHETICAL: NO - - (iv) 
ANTI-SENSE: NO - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #35: - - CTCCACACCC 
TCTCTGACAC CCTGTGGGGA CCTGGTGACT CTCGGCTACA GC - #TGAACTTC 60 - - CGAGCGACGC 
AGCCTTTGAA TGGGCGAGTG ATTGAGGCCT CCTTCCCTGC TG - #GAGTGGAC 12 0 - - AGCAGTCCTC 
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GGGCTGCTGA GCCAG -#-#14 5-- --(2) INFORMATION FOR SEQ ID NO : 36: - - (i) 
SEQUENCE CHARACTERISTICS: (A) LENGTH: 27 base - #pairs (B) TYPE: nucleic acid (C) 
STRANDEDNESS : single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA (genomic) (A) 
DESCRIPTION: 9th - #MN exon - - (iii) HYPOTHETICAL: NO - - (iv) ANTI -SENSE: NO - - 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #36: - - TCCAGCTGAA TTCCTGCCTG GCTGCTG - # - # 
27 - - - - (2) INFORMATION FOR SEQ ID NO: 37: - - (i) SEQUENCE CHARACTERISTICS: (A) 
LENGTH: 82 base - #pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: 
linear - - (ii) MOLECULE TYPE: DNA (genomic) (A) DESCRIPTION: 10th - #MN exon - - 

(iii) HYPOTHETICAL: NO - - (iv) ANTI -SENSE: NO - - (xi) SEQUENCE DESCRIPTION: SEQ ID 
NO: - #3 7: - - GTGACATCCT AGCCCTGGTT TTTGGCCTCC TTTTTGCTGT CACCAGCGTC GC - #GTTCCTTG 
60 - - TGCAGATGAG AAGGCAGCAC AG -#-#82 - - - - (2) INFORMATION FOR SEQ ID NO: 38: 

- - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 191 base - #pairs (B) TYPE: nucleic acid 
(C) STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA (genomic) 

(A) DESCRIPTION: 11th - #MN exon - - (iii) HYPOTHETICAL: NO - - (iv) ANTI-SENSE: NO - 

- (xi) SEQUENCE DESCRIPTION: SEQ ID NO : - #3 8: - - AAGGGGAACC AAAGGGGGTG TGAGCTACCG 
CCCAGCAGAG GTAGCCGAGA CT - #GGAGCCTA 60 - - GAGGCTGGAT CTTGGAGAAT GTGAGAAGCC 
AGCCAGAGGC ATCTGAGGGG GA - #GCCGGTAA 120 - - CTGTCCTGTC CTGCTCATTA TGCCACTTCC 
TTTTAACTGC CAAGAAATTT TT - #TAAAATAA 180 - - ATATTTATAA T - # - # - # 191 - - - - (2) 
INFORMATION FOR SEQ ID NO: 39: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 1174 base 

- #pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) 
MOLECULE TYPE: DNA (genomic) (A) DESCRIPTION: 1st - #MN intron - - (iii) HYPOTHETICAL: 
NO - - (iv) ANTI -SENSE: NO - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #3 9: - - 
GTAAGTGGTC ATCAATCTCC AAATCCAGGT TCCAGGAGGT TCATGACTCC CC - #TCCCATAC 6 0 - - 
CCCAGCCTAG GCTCTGTTCA CTCAGGGAAG GAGGGGAGAC TGTACTCCCC AC - #AGAAGCCC 12 0 - - 
TTCCAGAGGT CCCATACCAA TATCCCCATC CCCACTCTCG GAGGTAGAAA GG - #GACAGATG 180 - - 
TGGAGAGAAA ATAAAAAGGG TGCAAAAGGA GAGAGGTGAG CTGGATGAGA TG - #GGAGAGAA 24 0 - - 
GGGGGAGGCT GGAGAAGAGA AAGGGATGAG AACTGCAGAT GAGAGAAAAA AT - #GTGCAGAC 300 - - 
AGAGGAAAAA AATAGGTGGA GAAGGAGAGT CAGAGAGTTT GAGGGGAAGA GA - #AAAGGAAA 360 - - 
GCTTGGGAGG TGAAGTGGGT ACCAGAGACA AGCAAGAAGA GCTGGTAGAA GT - #CATCTCAT 42 0 - - 
CTTAGGCTAC AATGAGGAAT TGAGACCTAG GAAGAAGGGA CACAGCAGGT AG - #AGAAACGT 480 - - 
GGCTTCTTGA CTCCCAAGCC AGGAATTTGG GGAAAGGGGT TGGAGACCAT AC - #AAGGCAGA 54 0 - - 
GGGATGAGTG GGGAGAAGAA AGAAGGGAGA AAGGAAAGAT GGTGTACTCA CT - #CATTTGGG 60 0 - - 
ACTCAGGACT GAAGTGCCCA CTCACTTTTT TTTTTTTTTT TTTTGAGACA AA - #CTTTCACT 660 - - 
TTTGTTGCCC AGGCTGGAGT GCAATGGCGC GATCTCGGCT CACTGCAACC TC - #CACCTCCC 720 - - 
GGGTTCAAGT GATTCTCCTG CCTCAGCCTC TAGCCAAGTA GCTGCGATTA CA - #GGCATGCG 7 80 - - 
CCACCACGCC CGGCTAATTT TTGTATTTTT AGTAGAGACG GGGTTTCGCC AT - #GTTGGTCA 84 0 - - 
GGCTGGTCTC GAACTCCTGA TCTCAGGTGA TCCAACCACC CTGGCCTCCC AA - #AGTGCTGG 900 - - 
GATTATAGGC GTGAGCCACA GCGCCTGGCC TGAAGCAGCC ACTCACTTTT AC - #AGACCCTA 960 - - 
AGACAATGAT TGCAAGCTGG TAGGATTGCT GTTTGGCCCA CCCAGCTGCG GT - #GTTGAGTT 102 0 - - 
TGGGTGCGGT CTCCTGTGCT TTGCACCTGG CCCGCTTAAG GCATTTGTTA CC - #CGTAATGC 1080 - - 
TCCTGTAAGG CATCTGCGTT TGTGACATCG TTTTGGTCGC CAGGAAGGGA TT - #GGGGCTCT 114 0 - - 
AAGCTTGAGC GGTTCATCCT TTTCATTTAT ACAG - # - # 1174 - - - - (2) INFORMATION FOR SEQ ID 
NO: 40: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 193 base - #pairs (B) TYPE: 
nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA 
(genomic) (A) DESCRIPTION: 2nd - #MN intron - - (iii) HYPOTHETICAL: NO - - (iv) 
ANTI-SENSE: NO - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #40: - - GTGAGACACC 
CACCCGCTGC ACAGACCCAA TCTGGGAACC CAGCTCTGTG GA - #TCTCCCCT 60 - - ACAGCCGTCC 
CTGAACACTG GTCCCGGGCG TCCCACCCGC CGCCCACCGT CC - #CACCCCCT 120 - - CACCTTTTCT 
ACCCGGGTTC CCTAAGTTCC TGACCTAGGC GTCAGACTTC CT - #C ACT AT AC 180 - - TCTCCCACCC CAG - # 

- # - # 193 - - - - (2) INFORMATION FOR SEQ ID NO: 41: - - (i) SEQUENCE 
CHARACTERISTICS: (A) LENGTH: 131 base - #pairs (B) TYPE: nucleic acid (C) 
STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA (genomic) (A) 
DESCRIPTION: 3rd - #MN intron - - (iii) HYPOTHETICAL: NO - - (iv) ANTI-SENSE: NO - - 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #41: - - GTGAGGGGGT CTCCCCGCCG AGACTTGGGG 
ATGGGGCGGG GCGCAGGGAA GG - #GAACCGTC 60 - - GCGCAGTGCC TGCCCGGGGG TTGGGCTGGC 
CCTACCGGGC GGGGCCGGCT CA - #CTTGCCTC 120 - - TCCCTACGCA G-#-#-#131---- (2) 
INFORMATION FOR SEQ ID NO: 42 : - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 89 base - 
#pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) 
MOLECULE TYPE: DNA (genomic) (A) DESCRIPTION: 4th MN - # intron - - (iii) 
HYPOTHETICAL: NO - - (iv) ANTI -SENSE: NO - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO : - 
#42: - - GTGAGCGCGG ACTGGCCGAG AAGGGGCAAA GGAGCGGGGC GGACGGGGGC CA - #GAGACGTG 60 - - 
GCCCTCTCCT ACCCTCGTGT CCTTTTCAG -#-#89---- (2) INFORMATION FOR SEQ ID NO: 43: 

- - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 1400 base - #pairs (B) TYPE: nucleic 
acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA 
(genomic) (A) DESCRIPTION: 5th - #MN intron - - (iii) HYPOTHETICAL: NO - - (iv) 
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ANTI-SENSE: NO - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #43: - - GTACCAGATC 
CTGGACACCC CCTACTCCCC GCTTTCCCAT CCCATGCTCC TC - #CCGGACTC 60 - - TATCGTGGAG 
CCAGAGACCC CATCCCAGCA AGCTCACTCA GGCCCCTGGC TG - #ACAAACTC 120 - - ATTCACGCAC 
TGTTTGTTCA TTTAACACCC ACTGTGAACC AGGCACCAGC CC - #CCAACAAG 180 - - GATTCTGAAG 
CTGTAGGTCC TTGCCTCTAA GGAGCCCACA GCCAGTGGGG GA - #GGCTGACA 240 - - TGACAGACAC 
ATAGGAAGGA CATAGTAAAG ATGGTGGTCA CAGAGGAGGT GA - #CACTTAAA 3 00 - - GCCTTCACTG 
GTAGAAAAGA AAAGGAGGTG TTCATTGCAG AGGAAACAGA AT - #GTGCAAAG 3 60 - - ACTCAGAATA 
TGGCCTATTT AGGGAATGGC T AC AT AC AC C ATGATTAGAG GA - #GGCCCAGT 420 - - AAAGGGAAGG 
GATGGTGAGA TGCCTGCTAG GTTCACTCAC TCACTTTTAT TT - #ATTTATTT 4 80 - - ATTTTTTTGA 
CAGTCTCTCT GTCGCCCAGG CTGGAGTGCA GTGGTGTGAT CT - #TGGGTCAC 540 - - TGCAACTTCC 
GCCTCCCGGG TTCAAGGGAT TCTCCTGCCT CAGCTTCCTG AG - #TAGCTGGG 6 00 - - GTTACAGGTG 
TGTGCCACCA TGCCCAGCTA ATTTTTTTTT GTATTTTTAG TA - #GACAGGGT 660 - - TTCACCATGT 
TGGTCAGGCT GGTCTCAAAC TCCTGGCCTC AAGTGATCCG CC - #TGACTCAG 72 0 - - CCTACCAAAG 
TGCTGATTAC AAGTGTGAGC CACCGTGCCC AGCCACACTC AC - #TGATTCTT 780 - - TAATGCCAGC 
CACACAGCAC AAAGTTCAGA GAAATGCCTC CATCATAGCA TG - #TCAATATG 840 - - TTCATACTCT 
TAGGTTCATG ATGTTCTTAA CATTAGGTTC ATAAGCAAAA TA - #AGAAAAAA 900 - - GAATAATAAA 
TAAAAGAAGT GGCATGTCAG GACCTCACCT GAAAAGCCAA AC - #ACAGAATC 960 - - ATGAAGGTGA 
ATGCAGAGGT GACACCAACA CAAAGGTGTA TATATGGTTT CC - #TGTGGGGA 102 0 - - GTATGTACGG 
AGGCAGCAGT GAGTGAGACT GCAAACGTCA GAAGGGCACG GG - #TCACTGAG 10 80 - - AGCCTAGTAT 
CCTAGTAAAG TGGGCTCTCT CCCTCTCTCT CCAGCTTGTC AT - #TGAAAACC 114 0 - - AGTCCACCAA 
GCTTGTTGGT TCGCACAGCA AGAGTACATA GAGTTTGAAA TA - #ATACATAG 12 00 - - GATTTTAAGA 
GGGAGACACT GTCTCTAAAA AAAAAAACAA CAGCAACAAC AA - #AAAGCAAC 12 60 - - AACCATTACA 
ATTTTATGTT CCCTCAGCAT TCTCAGAGCT GAGGAATGGG AG - #AGGACTAT 132 0 - - GGGAACCCCC 
TTCATGTTCC GGCCTTCAGC CATGGCCCTG G AT AC ATG C A CT - #CATCTGTC 13 80 - - TTACAATGTC 
ATTCCCCCAG - # - # 140 - #0 - - - - (2) INFORMATION FOR SEQ ID NO: 44 : - - (i) 
SEQUENCE CHARACTERISTICS: (A) LENGTH: 1334 base - #pairs (B) TYPE: nucleic acid (C) 
STRANDEDNESS : single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA (genomic) (A) 
DESCRIPTION: 6th - #MN intron - - (iii) HYPOTHETICAL: NO - - (iv) ANTI-SENSE: NO - - 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : - #44: - - GTCAGTTTGT TGGTCTGGCC ACTAATCTCT 
GTGGCCTAGT TCATAAAGAA TC - #ACCCTTTG 60 - - GAGCTTCAGG TCTGAGGCTG GAGATGGGCT 
CCCTCCAGTG CAGGAGGGAT TG - #AAGCATGA 120 - - GCCAGCGCTC ATCTTGATAA TAACCATGAA 
GCTGACAGAC ACAGTTACCC GC - #AAACGGCT 180 - - GCCTACAGAT TGAAAACCAA GCAAAAACCG 
CCGGGCACGG TGGCTCACGC CT - #GTAATCCC 240 - - AGCACTTTGG GAGGCCAAGG CAGGTGGATC 
ACGAGGTCAA GAGATCAAGA CC - 

Detailed Description Paragraph Table (14) : 

#ATCCTGGC 3 00 - - CAACATGGTG AAACCCCATC TCTACTAAAA ATACGAAAAA ATAGCCAGGC GT - 
#GGTGGCGG 3 60 - - GTGCCTGTAA TCCCAGCTAC TCGGGAGGCT GAGGCAGGAG AATGGCATGA AC - 
#CCGGGAGG 420 - - CAGAAGTTGC AGTGAGCCGA GATCGTGCCA CTGCACTCCA GCCTGGGCAA CA - 
#GAGCGAGA 4 80 - - CTCTTGTCTC AAAAAAAAAA AAAAAAAAGA AAACCAAGCA AAAACCAAAA TG - 
#AGACAAAA 54 0 - - AAAACAAGAC CAAAAAATGG TGTTTGGAAA TTGTCAAGGT CAAGTCTGGA GA - 
#GCTAAACT 600 - - TTTTCTGAGA ACTGTTTATC TTTAATAAGC ATCAAATATT TTAACTTTGT AA - 
#ATACTTTT 660 - - GTTGGAAATC GTTCTCTTCT TAGTCACTCT TGGGTCATTT TAAATCTCAC TT - 
#ACTCTACT 72 0 - - AGACCTTTTA GGTTTCTGCT AGACTAGGTA GAACTCTGCC TTTGCATTTC TT - 
#GTGTCTGT 7 80 - - TTTGTATAGT TATCAATATT CATATTTATT TACAAGTTAT TCAGATCATT TT - 
#TTCTTTTC 84 0 - - TTTTTTTTTT TTTTTTTTTT TTTTACATCT TTAGTAGAGA CAGGGTTTCA CC - 
#ATATTGGC 900 - - CAGGCTGCTC TCAAACTCCT GACCTTGTGA TCCACCAGCC TCGGCCTCCC AA - 
#AGTGCTGG 960 - - GATTCATTTT TTCTTTTTAA TTTGCTCTGG GCTTAAACTT GTGGCCCAGC AC - 
#TTTATGAT 102 0 - - GGTACACAGA GTTAAGAGTG TAGACTCAGA CGGTCTTTCT TCTTTCCTTC TC - 
#TTCCTTCC 1080 - - TCCCTTCCCT CCCACCTTCC CTTCTCTCCT TCCTTTCTTT CTTCCTCTCT TG - 
#CTTCCTCA 114 0 - - GGCCTCTTCC AGTTGCTCCA AAGCCCTGTA CTTTTTTTTG AGTTAACGTC TT - 
#ATGGGAAG 12 0 0 - - GGCCTGCACT TAGTGAAGAA GTGGTCTCAG AGTTGAGTTA CCTTGGCTTC TG - 
#GGAGGTGA 12 6 0 - - AACTGTATCC CTATACCCTG AAGCTTTAAG GGGGTGCAAT GTAGATGAGA CC - 
#CCAACATA 1320 - - GATCCTCTTC ACAG -#-#-# 1334 - - - - (2) INFORMATION FOR SEQ ID 
NO: 45: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 512 base - #pairs (B) TYPE: 
nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA 
(genomic) (A) DESCRIPTION: 7th - #MN intron - - (iii) HYPOTHETICAL: NO - - (iv) 
ANTI-SENSE: NO - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #45: - - GTGGGCCTGG 
GGTGTGTGTG GACACAGTGG GTGCGGGGGA AAGAGGATGT AA - #G ATG AG AT 60 - - GAGAAACAGG 
AGAAGAAAGA AATCAAGGCT GGGCTCTGTG GCTTACGCCT AT - #AATCCCAC 120 - - CACGTTGGGA 
GGCTGAGGTG GGAGAATGGT TTGAGCCCAG GAGTTCAAGA CA - #AGGCGGGG 18 0 - - CAACATAGTG 
TGACCCCATC TCTACCAAAA AAACCCCAAC AAAACCAAAA AT - #AGCCGGGC 24 0 - - ATGGTGGTAT 
GCGGCCTAGT CCCAGCTACT CAAGGAGGCT GAGGTGGGAA GA - #TCGCTTGA 3 00 - - TTCCAGGAGT 
TTGAGACTGC AGTGAGCTAT GATCCCACCA CTGCCTACCA TC - #TTTAGGAT 3 60 - - ACATTTATTT 
ATTTATAAAA GAAATCAAGA GGCTGGATGG GGAATACAGG AG - #CTGGAGGG 42 0 - - TGGAGCCCTG 
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AGGTGCTGGT TGTGAGCTGG CCTGGGACCC TTGTTTCCTG TC - #ATGCCATG 4 80 - - AACCCACCCA 
CACTGTCCAC TGACCTCCCT AG -#-#512---- (2) INFORMATION FOR SEQ ID NO : 46: - - 

(i) SEQUENCE CHARACTERISTICS : (A) LENGTH: 114 base - #pairs (B) TYPE: nucleic acid (C) 
STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA (genomic) (A) 
DESCRIPTION: 8th - #MN intron - - (iii) HYPOTHETICAL: NO - - (iv) ANTI-SENSE: NO - - 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : - #46: - - GTACAGCTTT GTCTGGTTTC CCCCCAGCCA 
GTAGTCCCTT ATCCTCCCAT GT - #GTGTGCCA 60 - - GTGTCTGTCA TTGGTGGTCA CAGCCCGCCT 
CTCACATCTC CTTTTTCTCT CC - #AG 114 - - - - (2) INFORMATION FOR SEQ ID NO: 47: - - (i) 
SEQUENCE CHARACTERISTICS: (A) LENGTH: 617 base - #pairs (B) TYPE: nucleic acid (C) 
STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA (genomic) (A) 
DESCRIPTION: 9th - #MN intron - - (iii) HYPOTHETICAL: NO - - (iv) ANTI -SENSE: NO - - 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #47: - - GTGAGTCTGC CCCTCCTCTT GGTCCTGATG 
CCAGGAGACT CCTCAGCACC AT - #TCAGCCCC 60 - - AGGGCTGCTC AGGACCGCCT CTGCTCCCTC 
TCCTTTTCTG CAGAACAGAC CC - #CAACCCCA 120 - - ATATTAGAGA GGCAGATCAT GGTGGGGATT 
CCCCCATTGT CCCCAGAGGC TA - #ATTGATTA 180 - - GAATGAAGCT TGAGAAATCT CCCAGCATCC 
CTCTCGCAAA AGAATCCCCC CC - #CCTTTTTT 240 - - TAAAGATAGG GTCTCACTCT GTTTGCCCCA 
GGCTGGGGTG TTGTGGCACG AT - #CATAGCTC 3 00 - - ACTGCAGCCT CGAACTCCTA GGCTCAGGCA 
ATCCTTTCAC CTTAGCTTCT CA - #AAGCACTG 360 - - GGACTGTAGG CATGAGCCAC TGTGCCTGGC 
CCCAAACGGC CCTTTTACTT GG - #CTTTTAGG 420 - - AAGCAAAAAC GGTGCTTATC TTACCCCTTC 
TCGTGTATCC ACCCTCATCC CT - #TGGCTGGC 4 80 - - CTCTTCTGGA GACTGAGGCA CTATGGGGCT 
GCCTGAGAAC TCGGGGCAGG GG - #TGGTGGAG 540 - - TGCACTGAGG CAGGTGTTGA GGAACTCTGC 
AGACCCCTCT TCCTTCCCAA AG - #CAGCCCTC 600 - - TCTGCTCTCC ATCGCAG -#-#-#617--- 

- (2) INFORMATION FOR SEQ ID NO : 48: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 13 0 
base - #pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - 

(ii) MOLECULE TYPE: DNA (genomic) (A) DESCRIPTION: 10th - #MN intron - - (iii) 
HYPOTHETICAL: NO - - (iv) ANTI -SENSE: NO - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO : - 
#48: - - GTATTACACT GACCCTTTCT TCAGGCACAA GCTTCCCCCA CCCTTGTGGA GT - #CACTTCAT 60 - - 
GCAAAGCGCA TGCAAATGAG CTGCTCCTGG GCCAGTTTTC TGATTAGCCT TT - #CCTGTTGT 120 - - 
GTACACACAG -#-#-#130---- (2) INFORMATION FOR SEQ ID NO : 49: - - (i) SEQUENCE 
CHARACTERISTICS: (A) LENGTH: 1401 base - #pairs (B) TYPE: nucleic acid (C) 
STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA (genomic) (A) 
DESCRIPTION: Spans - # 3' part of 1st intron to beyond end of - #5th exon - - (iii) 
HYPOTHETICAL: NO - - (iv) ANTI -SENSE: NO - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - 
#4 9: - - CAAACTTTCA CTTTTGTTGC CCAGGCTGGA GTGCAATGGC GCGATCTCGG CT - #CACTGCAA 60 - - 
CCTCCACCTC CCGGGTTCAA GTGATTCTCC TGCCTCAGCC TCTAGCCAAG TA - #GCTGCGAT 12 0 - - 
TACAGGCATG CGCCACCACG CCCGGCTAAT TTTTGTATTT TTAGTAGAGA CG - #GGGTTTCG 18 0 - - 
CCATGTTGGT CAGGCTGGTC TCGAACTCCT GATCTCAGGT GATCCAACCA CC - #CTGGCCTC 24 0 - - 
CCAAAGTGCT GGGATTATAG GCGTGAGCCA CAGCGCCTGG CCTGAAGCAG CC - #ACTCACTT 300 - - 
TTACAGACCC TAAGACAATG ATTGCAAGCT GGTAGGATTG CTGTTTGGCC CA - #CCCAGCTG 3 60 - - 
CGGTGTTGAG TTTGGGTGCG GTCTCCTGTG CTTTGCACCT GGCCCGCTTA AG - #GCATTTGT 42 0 - - 
TACCCGTAAT GCTCCTGTAA GGCATCTGCG TTTGTGACAT CGTTTTGGTC GC - #CAGGAAGG 480 - - 
GATTGGGGCT CTAAGCTTGA GCGGTTCATC CTTTTCATTT ATACAGGGGA TG - #ACCAGAGT 54 0 - - 
CATTGGCGCT ATGGAGGTGA GACACCCACC CGCTGCACAG ACCCAATCTG GG - #AACCCAGC 60 0 - - 
TCTGTGGATC TCCCCTACAG CCGTCCCTGA ACACTGGTCC CGGGCGTCCC AC - #CCGCCGCC 660 - - 
CACCGTCCCA CCCCCTCACC TTTTCTACCC GGGTTCCCTA AGTTCCTGAC CT - #AGGCGTCA 72 0 - - 
GACTTCCTCA CTATACTCTC CCACCCCAGG CGACCCGCCC TGGCCCCGGG TG - #TCCCCAGC 7 80 - - 
CTGCGCGGGC CGCTTCCAGT CCCCGGTGGA TATCCGCCCC CAGCTCGCCG CC - #TTCTGCCC 840 - - 
GGCCCTGCGC CCCCTGGAAC TCCTGGGCTT CCAGCTCCCG CCGCTCCCAG AA - #CTGCGCCT 900 - - 
GCGCAACAAT GGCCACAGTG GTGAGGGGGT CTCCCCGCCG AGACTTGGGG AT - #GGGGCGGG 960 - - 
GCGCAGGGAA GGGAACCGTC GCGCAGTGCC TGCCCGGGGG TTGGGCTGGC CC - #TACCGGGC 102 0 - - 
GGGGCCGGCT CACTTGCCTC TCCCTACGCA GTGCAACTGA CCCTGCCTCC TG - #GGCTAGAG 1080 - - 
ATGGCTCTGG GTCCCGGGCG GGAGTACCGG GCTCTGCAGC TGCATCTGCA CT - #GGGGGGCT 114 0 - - 
GCAGGTCGTC CGGGCTCGGA GCACACTGTG GAAGGCCACC GTTTCCCTGC CG - #AGGTGAGC 12 00 - - 
GCGGACTGGC CGAGAAGGGG CAAAGGAGCG GGGCGGACGG GGGCCAGAGA CG - #TGGCCCTC 12 60 - - 
TCCTACCCTC GTGTCCTTTT CAGATCCACG TGGTTCACCT CAGCACCGCC TT - #TGCCAGAG 13 2 0 - - 
TTGACGAGGC CTTGGGGCGC CCGGGAGGCC TGGCCGTGTT GGCCGCCTTT CT - #GGAGGTAC 13 80 - - 
CAGATCCTGG ACACCCCCTA C - # - # 1401 - - - - (2) INFORMATION FOR SEQ ID NO : 50: - - 

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 98 amino - #acids (B) TYPE: amino acid (D) 
TOPOLOGY: linear - - (ii) MOLECULE TYPE: protein (A) DESCRIPTION: Regio - #n of 
homology to collagen alpha 1 chain - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #50: - 

- Gin Arg Leu Pro Arg Met Gin Glu - # Asp Ser Pro Leu Gly Gly Gly Ser l-#5-#10- 

# 15 - - Ser Gly Glu Asp Asp Pro Leu Gly - # Glu Glu Asp Leu Pro Ser Glu Glu 20 - # 25 

- # 30 - - Asp Ser Pro Arg Glu Glu Asp Pro - # Pro Gly Glu Glu Asp Leu Pro Gly 35 - # 
40 - # 45 - - Glu Glu Asp Leu Pro Gly Glu Glu - # Asp Leu Pro Glu Val Lys Pro Lys 50 - 

# 55 - # 60 - - Ser Glu Glu Glu Gly Ser Leu Lys - # Leu Glu Asp Leu Pro Thr Val Glu 65 
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- # 70 - # 75 - # 80 - - Ala Pro Gly Asp Pro Gin Glu Pro - # Gin Asn Asn Ala His Arg 
Asp Lys - # 85 - # 90 - # 95 - - Glu Gly - - - - (2) INFORMATION FOR SEQ ID NO: 51: - 

- (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 256 amino - #acids (B) TYPE : amino acid 
<D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: protein (A) DESCRIPTION: carbo - #nic 
anhydrase domain - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #51: - - Asp Asp Gin Ser 
His Trp Arg Tyr - # Gly Gly Asp Pro Pro Trp Pro Arg l-#5-#10-#15-- Val Ser 
Pro Ala Cys Ala Gly Arg - # Phe Gin Ser Pro Val Asp He Arg 20 - # 25 - # 30 - - Pro 
Gin Leu Ala Ala Phe Cys Pro - # Ala Leu Arg Pro Leu Glu Leu Leu 35-#40-#45-- 
Gly Phe Gin Leu Pro Pro Leu Pro - # Glu Leu Arg Leu Arg Asn Asn Gly 50 - # 55 - # 60 

Detailed Description Paragraph Table (15) : 
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Pro 


Pro 


- # 


Cys 


Ala 


Gin Gly Val He 


Trp 


Thr 195 - # 200 


- # 205 





- Val Phe Asn Gin Thr Val Met Leu - # Ser Ala Lys Gin Leu His Thr Leu 210 - # 215 - # 
220 - - Ser Asp Thr Leu Trp Gly Pro Gly - # Asp Ser Arg Leu Gin Leu Asn Phe 225 - # 
230 - # 235 - # 240 - - Arg Ala Thr Gin Pro Leu Asn Gly - # Arg Val He Glu Ala Ser 
Phe Pro - # 245 - # 250 - # 255 - - - - (2) INFORMATION FOR SEQ ID NO: 52 : - - (i) 
SEQUENCE CHARACTERISTICS: (A) LENGTH: 20 amino - #acids (B) TYPE: amino acid (C) 
STRANDEDNESS : (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: peptide (A) DESCRIPTION: 
trans - #membrane region - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #52: - - Asp He 
Leu Ala Leu Val Phe Gly - # Leu Leu Phe Ala Val Thr Ser Val l-#5-#10-#15-- 
Ala Phe Leu Val 20 - - - - (2) INFORMATION FOR SEQ ID NO: 53 : - - (i) SEQUENCE 
CHARACTERISTICS: (A) LENGTH: 25 amino - #acids (B) TYPE: amino acid (C) STRANDEDNESS: 
(D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: peptide (A) DESCRIPTION: intra - 
#cellular C-terminus - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #53: - - Met Arg Arg 
Gin His Arg Arg Gly - # Thr Lys Gly Gly Val Ser Tyr Arg l-#5-#10-#15-- Pro 
Ala Glu Val Ala Glu Thr Gly - # Ala 20-#25---- (2) INFORMATION FOR SEQ ID NO: 
54: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 170 amino - #acids (B) TYPE: amino 
acid (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: protein - - (xi) SEQUENCE 
DESCRIPTION: SEQ ID NO : - #54: - - Arg Ala Leu Gin Leu His Leu His - # Trp Gly Ala Ala 
Gly Arg Pro Gly l-#5-#10-#15-- Ser Glu His Thr Val Glu Gly His - # Arg Phe 
Pro Ala Glu He His Val 20 - # 25 - # 30 - - Val His Leu Ser Thr Ala Phe Ala - # Arg 
Val Asp Glu Ala Leu Gly Arg 35 - # 40 - # 45 - - Pro Gly Gly Leu Ala Val Leu Ala - # 
Ala Phe Leu Glu Glu Gly Pro Glu 50 - # 55 - # 60 - - Glu Asn Ser Ala Tyr Glu Gin Leu - 
# Leu Ser Arg Leu Glu Glu He Ala 65 - # 70 - # 75 - # 80 - - Glu Glu Gly Ser Glu Thr 
Gin Val - # Pro Gly Leu Asp He Ser Ala Leu - # 85 - # 90 - # 95 - - Leu Pro Ser Asp 
Phe Ser Arg Tyr - # Phe Gin Tyr Glu Gly Ser Leu Thr 100 - # 105 - # 110 - - Thr Pro 
Pro Cys Ala Gin Gly Val - # He Trp Thr Val Phe Asn Gin Thr 115 - # 120 - # 125 - - 
Val Met Leu Ser Ala Lys Gin Leu - # His Thr Leu Ser Asp Thr Leu Trp 130 - # 135 - # 
140 - - Gly Pro Gly Asp Ser Arg Leu Gin - # Leu Asn Phe Arg Ala Thr Gin Pro 145 - # 
150 - # 155 - # 160 - - Leu Asn Gly Arg Val He Glu Ala - # Ser Phe - # 165 - # 170 - 

- - - (2) INFORMATION FOR SEQ ID NO: 55 : - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 
470 base - #pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear 

- - (ii) MOLECULE TYPE: RNA - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO : - #55: - - 
CAUGGCCCCG AUAACCUUCU GCCUGUGCAC ACACCUGCCC CUCACUCCAC CC - #CCAUCCUA 60 - - 
GCUUUGGUAU GGGGGAGAGG GCACAGGGCC AGACAAACCU GUGAGACUUU GG - #CUCCAUCU 120 - - 
CUGCAAAAGG GCGCUCUGUG AGUCAGCCUG CUCCCCUCCA GGCUUGCUCC UC - #CCCCACCC 180 - - 
AGCUCUCGUU UCCAAUGCAC GUACAGCCCG UACACACCGU GUGCUGGGAC AC - #CCCACAGU 24 0 - - 
CAGCCGCAUG GCUCCCCUGU GCCCCAGCCC CUGGCUCCCU CUGUUGAUCC CG - #GCCCCUGC 3 00 - - 
UCCAGGCCUC ACUGUGCAAC UGCUGCUGUC ACUGCUGCUU CUGGUGCCUG UC - #CAUCCCCA 360 - - 
GAGGUUGCCC CGGAUGCAGG AGGAUUCCCC CUUGGGAGGA GGCUCUUCUG GG - #GAAGAUGA 42 0 - - 
CCCACUGGGC GAGGAGGAUC UGCCCAGUGA AGAGGAUUCA CCCAGAGAGG -#470---- (2) INFORMATION 
FOR SEQ ID NO: 56: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 292 base - #pairs (B) 
TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE 
TYPE: DNA (genomic) (A) DESCRIPTION: Alu - #repeat within MN genomic region - - (iii) 
HYPOTHETICAL: NO - - (iv) ANTI -SENSE: NO - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO : - 
#56: - - GTTTTTTTGA GACGGAGTCT TGCATCTGTC ATGCCCAGGC TGGAGTAGCA GT - #GGTGCCAT 60 - - 
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CTCGGCTCAC TGCAAGCTCC ACCTCCCGAG TTCACGCCAT TTTCCTGCCT CA - #GCCTCCCG 12 0 - - 
AGTAGCTGGG ACTACAGGCG CCCGCCACCA TGCCCGGCTA ATTTTTTGTA TT - #TTTGGTAG 180 - - 
AGACGGGGTT TCACCGTGTT AGCCAGAATG GTCTCGATCT CCTGACTTCG TG - #ATCCACCC 24 0 - - 
GCCTCGGCCT CCCAAAGTTC TGGGATTACA GGTGTGAGCC ACCGCACCTG GC-#292---- (2) 
INFORMATION FOR SEQ ID NO : 57: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 2 62 base 

- #pairs (B) TYPE: nucleic acid (C) STRANDEDNESS : single (D) TOPOLOGY: linear - - (ii) 
MOLECULE TYPE: DNA (genomic) (A) DESCRIPTION: Alu - #repeat within MN genomic region - 

- (iii) HYPOTHETICAL: NO - - (iv) ANTI -SENSE: NO - - (xi) SEQUENCE DESCRIPTION: SEQ ID 
NO: - #57: - - TTTCTTTTTT GAGACAGGGT CTTGCTCTGT CACCCAGGCC AGAGTGCAAT GG - #TACAGTCT 
60 - - CAGCTCACTG CAGCCTCAAC CGCCTCGGCT CAAACCATCA TCCCATTTCA GC - #CTCCTGAG 120 - - 
TAGCTGGGAC TACAGGCACA TGCCATTACA CCTGGCTAAT TTTTTTGTAT TT - #CTAGTAGA 18 0 - - 
GACAGGGTTT GGCCATGTTG CCCGGGCTGG TCTCGAACTC CTGGACTCAA GC - #AATCCACC 24 0 - - 
CACCTCAGCC TCCCAAAATG AG - # - # 262 - - - - (2) INFORMATION FOR SEQ ID NO : 58: - - 

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 904 base - #pairs (B) TYPE: nucleic acid (C) 
STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA (genomic) - - 

(iii) HYPOTHETICAL: NO - - (iv) ANTI-SENSE: NO - - (xi) SEQUENCE DESCRIPTION: SEQ ID 
NO: - #5 8: - - GCTGGTCTCG AACTCCTGGA CTCAAGCAAT CCACCCACCT CAGCCTCCCA AA - #ATGAGGGA 
60 - - CCGTGTCTTA TTCATTTCCA TGTCCCTAGT CCATAGCCCA GTGCTGGACC TA - #TGGTAGTA 120 - - 
CTAAATAAAT ATTTGTTGAA TGCAATAGTA AATAGCATTT CAGGGAGCAA GA - #ACTAGATT 18 0 - - 
AACAAAGGTG GTAAAAGGTT TGGAGAAAAA AATAATAGTT TAATTTGGCT AG - #AGTATGAG 24 0 - - 
GGAGAGTAGT AGGAGACAAG ATGGAAAGGT CTCTTGGGCA AGGTTTTGAA GG - #AAGTTGGA 300 - - 
AGTCAGAAGT ACACAATGTG CATATCGTGG CAGGCAGTGG GGAGCCAATG AA - #GGCTTTTG 3 60 - - 
AG C AGG AG AG TAATGTGTTG AAAAATAAAT ATAGGTTAAA CCTATCAGAG CC - #CCTCTGAC 42 0 - - 
AC AT AC AC TT GCTTTTCATT CAAGCTCAAG TTTGTCTCCC ACATACCCAT TA - #CTTAACTC 4 80 - - 
ACCCTCGGGC TCCCCTAGCA GCCTGCCCTA CCTCTTTACC TGCTTCCTGG TG - #GAGTCAGG 54 0 - - 
GATGTATACA TGAGCTGCTT TCCCTCTCAG CCAGAGGACA TGGGGGGCCC CA - #GCTCCCCT 600 - - 
GCCTTTCCCC TTCTGTGCCT GGAGCTGGGA AGCAGGCCAG GGTTAGCTGA GG - #CTGGCTGG 66 0 - - 
CAAGCAGCTG GGTGGTGCCA GGGAGAGCCT GCATAGTGCC AGGTGGTGCC TT - #GGGTTCCA 72 0 - - 
AGCTAGTCCA TGGCCCCGAT AACCTTCTGC CTGTGCACAC ACCTGCCCCT CA - #CTCCACCC 780 - - 
CCATCCTAGC TTTGGTATGG GGGAGAGGGC ACAGGGCCAG ACAAACCTGT GA - #GACTTTGG 840 - - 
CTCCATCTCT GCAAAAGGGC GCTCTGTGAG TCAGCCTGCT CCCCTCCAGG CT - #TGCTCCTC 900 - - CCCC - # 

- # - # 904 - - - - (2) INFORMATION FOR SEQ ID NO: 59: - - (i) SEQUENCE 
CHARACTERISTICS: (A) LENGTH: 2 92 base - #pairs (B) TYPE: nucleic acid (C) 
STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA (genomic) - - 
(iii) HYPOTHETICAL: NO - - (iv) ANTI -SENSE: NO - - (xi) SEQUENCE DESCRIPTION: SEQ ID 
NO: - #5 9: - - TTTTTTTGAG ACGGAGTCTT GCATCTGTCA TGCCCAGGCT GGAGTAGCAG TG - #GTGCCATC 
60 - - TCGGCTCACT GCAAGCTCCA CCTCCCGAGT TCACGCCATT TTCCTGCCTC AG - #CCTCCCGA 120 - - 
GTAGCTGGGA CTACAGGCGC CCGCCACCAT GCCCGGCTAA TTTTTTGTAT TT - #TTGGTAGA 18 0 - - 
GACGGGGTTT CACCGTGTTA GCCAGAATGG TCTCGATCTC CTGACTTCGT GA - #TCCACCCG 240 - - 
CCTCGGCCTC CCAAAGTTCT GGGATTACAG GTGTGAGCCA CCGCACCTGG CC-#292---- (2) 
INFORMATION FOR SEQ ID NO: 60: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 262 base 

- #pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) 
MOLECULE TYPE: DNA (genomic) - - (iii) HYPOTHETICAL: NO 

Detailed Description Paragraph Table (16) : 

- - (iv) ANTI -SENSE: NO - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO : - #60: - - 
TTCTTTTTTG AGACAGGGTC TTGCTCTGTC ACCCAGGCCA GAGTGCAATG GT - #ACAGTCTC 60 - - 
AGCTCACTGC AGCCTCAACC GCCTCGGCTC AAACCATCAT CCCATTTCAG CC - #TCCTGAGT 120 - - 
AGCTGGGACT ACAGGCACAT GCCATTACAC CTGGCTAATT TTTTTGTATT TC - #TAGTAGAG 180 - - 
ACAGGGTTTG GCCATGTTGC CCGGGCTGGT CTCGAACTCC TGGACTCAAG CA - #ATCCACCC 24 0 - - 
ACCTCAGCCT C C C AAAATG A GG - # - # 262 - - - - (2) INFORMATION FOR SEQ ID NO: 61: - - 

(i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 294 base - #pairs (B) TYPE: nucleic acid (C) 
STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA (genomic) - - 

(iii) HYPOTHETICAL: NO - - (iv) ANTI-SENSE: NO - - (xi) SEQUENCE DESCRIPTION: SEQ ID 
NO: - #61: - - TTTTTTTTTG AGACAAACTT TCACTTTTGT TGCCCAGGCT GGAGTGCAAT GG - #CGCGATCT 
60 - - CGGCTCACTG CAACCTCCAC CTCCCGGGTT CAAGTGATTC TCCTGCCTCA GC - #CTCTAGCC 120 - - 
AAGTAGCTGC GATTACAGGC ATGCGCCACC ACGCCCGGCT AATTTTTGTA TT - #TTTAGTAG 180 - - 
AGACGGGGTT TCGCCATGTT GGTCAGGCTG GTCTCGAACT CCTGATCTCA GG - #TGATCCAA 24 0 - - 
CCACCCTGGC CTCCCAAAGT GCTGGGATTA TAGGCGTGAG CCACAGCGCC TG - #GC 2 94 - - - - (2) 
INFORMATION FOR SEQ ID NO: 62 : - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 276 base 

- #pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) 
MOLECULE TYPE: DNA (genomic) - - (iii) HYPOTHETICAL: NO - - (iv) ANTI-SENSE: NO - - 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : - #62: - - TGACAGTCTC TCTGTCGCCC AGGCTGGAGT 
GCAGTGGTGT GATCTTGGGT CA - #CTGCAACT 60 - - TCCGCCTCCC GGGTTCAAGG GATTCTCCTG 
CCTCAGCTTC CTGAGTAGCT GG - #GGTTACAG 12 0 - - GTGTGTGCCA CCATGCCCAG CTAATTTTTT 
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TTTGTATTTT TAG TAG AC AG GG - #TTTCACCA 18 0 - - TGTTGGTCAG GCTGGTCTCA AACTCCTGGC 
CTCAAGTGAT CCGCCTGACT CA - #GCCTACCA 24 0 - - AAGTGCTGAT TACAAGTGTG AGCCACCGTG CCCAGC - 

# - # 276 - - - - (2) INFORMATION FOR SEQ ID NO: 63 : - - (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 289 base - #pairs (B) TYPE: nucleic acid (C) STRANDEDNESS : single (D) 
TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA (genomic) - - (iii) HYPOTHETICAL: NO - - 

(iv) ANTI-SENSE: NO - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #63: - - CGCCGGGCAC 
GGTGGCTCAC GCCTGTAATC CCAGCACTTT GGGAGGCCAA GG - #CAGGTGGA 60 - - TCACGAGGTC 
AAGAGATCAA GACCATCCTG GCCAACATGG TGAAACCCCA TC - #TCTACTAA 12 0 - - AAATACGAAA 
AAATAGCCAG GCGTGGTGGC GGGTGCCTGT AATCCCAGCT AC - #TCGGGAGG 180 - - CTGAGGCAGG 
AGAATGGCAT GAACCCGGGA GGCAGAAGTT GCAGTGAGCC GA - #GATCGTGC 24 0 - - CACTGCACTC 
CAGCCTGGGC AACAGAGCGA GACTCTTGTC TCAAAAAAA -#289---- (2) INFORMATION FOR SEQ ID 
NO: 64: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 298 base - #pairs (B) TYPE: 
nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA 

(genomic) - - (iii) HYPOTHETICAL: NO - - (iv) ANTI-SENSE: NO - - (xi) SEQUENCE 
DESCRIPTION: SEQ ID NO: - #64: - - AGGCTGGGCT CTGTGGCTTA CGCCTATAAT CCCACCACGT 
TGGGAGGCTG AG - #GTGGGAGA 60 - - ATGGTTTGAG CCCAGGAGTT CAAGACAAGG CGGGGCAACA 
TAGTGTGACC CC - #ATCTCTAC 12 0 - - CAAAAAAACC CCAACAAAAC CAAAAATAGC CGGGCATGGT 
GGTATGCGGC CT - #AGTCCCAG 18 0 - - CTACTCAAGG AGGCTGAGGT GGGAAGATCG CTTGATTCCA 
GGAGTTTGAG AC - #TGCAGTGA 24 0 - - GCTATGATCC CACCACTGCC TACCATCTTT AGGATACATT 
TATTTATTTA TA - #AAAGAA 298 - - - - (2) INFORMATION FOR SEQ ID NO : 65: - - (i) 
SEQUENCE CHARACTERISTICS: (A) LENGTH: 105 base - #pairs (B) TYPE: nucleic acid (C) 
STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA (genomic) - - 

(iii) HYPOTHETICAL: NO - - (iv) ANTI-SENSE: NO - - (xi) SEQUENCE DESCRIPTION: SEQ ID 
NO: - #65: - - TTTTTTACAT CTTTAGTAGA GACAGGGTTT CACCATATTG GCCAGGCTGC TC - #TCAAACTC 
60 - - CTGACCTTGT GATCCACCAG CCTCGGCCTC CCAAAGTGCT GGGAT -#105---- (2) 
INFORMATION FOR SEQ ID NO: 66 : - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 83 base - 
#pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) 
MOLECULE TYPE: DNA (genomic) - - (iii) HYPOTHETICAL: NO - - (iv) ANTI-SENSE: NO - - 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #66: - - CCTCGAACTC CTAGGCTCAG GCAATCCTTT 
CACCTTAGCT TCTCAAAGCA CT - #GGGACTGT 60 - - AGGCATGAGC CACTGTGCCT GGC -#-#83 - - - 

- (2) INFORMATION FOR SEQ ID NO: 67 : - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 11 
base - #pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - 
(ii) MOLECULE TYPE: DNA (genomic) (A) DESCRIPTION: 5'- # donor consensus splice 
sequence - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #67: - - AGAAGGTAAG T - # - # - # 
H - - _ _ (2) INFORMATION FOR SEQ ID NO: 68: - - (i) SEQUENCE CHARACTERISTICS: (A) 
LENGTH: 11 base - #pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: 
linear - - (ii) MOLECULE TYPE: DNA (genomic) (A) DESCRIPTION: 5'- # donor consensus 
splice sequence - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #68: - - TGGAGGTGAG A - # 
-#-#11---- (2) INFORMATION FOR SEQ ID NO : 69: - - (i) SEQUENCE 

CHARACTERISTICS: (A) LENGTH: 11 base - #pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: 
single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA (genomic) (A) DESCRIPTION: 5'- 

# donor consensus splice sequence - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO : - #69: - - 
CAGTCGTGAG G-#-#-#ll---- (2) INFORMATION FOR SEQ ID NO: 70 : - - (i) 
SEQUENCE CHARACTERISTICS: (A) LENGTH: 11 base - #pairs (B) TYPE: nucleic acid (C) 
STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA (genomic) (A) 
DESCRIPTION: 5'- # donor consensus splice sequence - - (xi) SEQUENCE DESCRIPTION: SEQ 
ID NO: - #70: - - CCGAGGTGAG C-#-#-#ll---- (2) INFORMATION FOR SEQ ID NO: 
71: - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 11 base - #pairs (B) TYPE: nucleic 
acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA 
(genomic) (A) DESCRIPTION: 5'- # donor consensus splice sequence - - (xi) SEQUENCE 
DESCRIPTION: SEQ ID NO: - #71: - - TGGAGGTACC A - # - # - # 11 - - - - (2) INFORMATION 
FOR SEQ ID NO: 72 : - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 11 base - #pairs (B) 
TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) MOLECULE 
TYPE: DNA (genomic) (A) DESCRIPTION: 5'- # donor consensus splice sequence - - (xi) 
SEQUENCE DESCRIPTION: SEQ ID NO: - #72: - - GGAAGGTCAG T-#-#-#ll---- (2) 
INFORMATION FOR SEQ ID NO: 73 : - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 11 base - 
#pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - (ii) 
MOLECULE TYPE: DNA (genomic) (A) DESCRIPTION: 5'- # donor consensus splice sequence - 

- (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #73: - - AGCAGGTGGG C-#-#-#ll---- 
(2) INFORMATION FOR SEQ ID NO: 74 : - - (i) SEQUENCE CHARACTERISTICS: (A) LENGTH: 11 
base - #pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear - - 
(ii) MOLECULE TYPE: DNA (genomic) (A) DESCRIPTION: 5'- # donor consensus splice 
sequence - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #74: - - GCCAGGTACA G - # - # - # 
11 - - - - (2) INFORMATION FOR SEQ ID NO: 75 : - - (i) SEQUENCE CHARACTERISTICS: (A) 
LENGTH: 11 base - #pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: 
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linear - - (ii) MOLECULE TYPE: DNA (genomic) (A) DESCRIPTION: 5'- # donor consensus 
splice sequence - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #75: - - TGCTGGTGAG T - # 

- # - # 11 - - - - (2) INFORMATION FOR SEQ ID NO: 76: - - (i) SEQUENCE 

CHARACTERISTICS: (A) LENGTH: 11 base - #pairs (B) TYPE: nucleic acid (C) STRANDEDNESS : 
single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA (genomic) (A) DESCRIPTION: 5'- 

# donor consensus splice sequence - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO: - #76: - - 
ATACAGGGGAT -#-#-#11---- (2) INFORMATION FOR SEQ ID NO : 77: - - (i) SEQUENCE 
CHARACTERISTICS: (A) LENGTH: 11 base - #pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: 
single (D) TOPOLOGY: linear - - (ii) MOLECULE TYPE: DNA (genomic) (A) DESCRIPTION: 3'- 

# acceptor consensus splice sequence - - (xi) SEQUENCE DESCRIPTION: SEQ ID NO : - #77: 

- - ATACAGGGGA T-#-#-#ll---- (2) INFORMATION FOR SEQ ID NO : 78: - - (i) 
SEQUENCE CHARACTERISTICS: 
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DOCUMENT- IDENTIFIER: US 5534438 A 

TITLE: Process for isolating genes and the gene causative of Huntington's disease and 
differential 3' polyadenylation in the gene 



Priority Application Year (1) : 
1993 



Abstract Text (1) : 

The underlying genetic defect of Huntington disease (HD) has been mapped to 
chromosomal band 4.sub.p 16.3. Refined localization using recombinant HD chromosome 
analysis and allelic association analyses have identified two distinct candidate 
regions. Using a cDNA hybrid selection procedure, . alpha . -adduc in has been mapped to 
the proximal 2.2 Mb 4D gene candidate region within 2 0 kb of D4S95. Several clones 
have been mapped within the minimal region containing the HD gene. The clones GT 70 
and GT 149 are particularly useful in detecting changes in this portion of the gene of 
HD patients. 

Brief Summary Text (9) : 

In order to facilitate a description of various embodiments of the invention, FIGS. 13 
and 15 of the Drawings show DNA sequences of GT 70, GT 149 and UTR of HD 14, 
respectively. A detailed description of the drawings follow hereinafter. 

Brief Summary Text (10) : 

Many aspects of the invention may be used to develop information respecting HD. 
Various clones of the HD gene and surrounding DNA sequences are valuable in gene 
diagnosis and family studies. According to an aspect of the invention, gene clones GT 
70 and GT 149 are particularly useful in detecting changes or re-arrangements in the 
HD gene to determine patient's susceptibility to HD . 

Brief Summary Text (11) : 

According to another aspect of the invention the HD gene includes cDNA clones GT 70 
and GT 14 9, as shown in FIG. 13. 

Brief Summary Text (12) : 

Another aspect of the present invention is a novel purified cDNA molecule having the 
sequence equivalent to GT 70. 

Brief Summary Text (13) : 

Another aspect of the present invention is a novel purified cDNA molecule having the 
sequence equivalent to GT 14 9. 

Drawing Description Text (14) : 

FIG. 4a. Mapping of transcriptional units within the HD candidate region. Overlapping 
regions with Yacs 353G6, 70D11 and 2A11 were used to define 5 separate genomic BINS. 
Yacs A187G12 and D102A10 were used to further refine BIN 3 into three separate 
compartments: A,B,C. GT clones were mapped by hybridization to digest Yac DNA and 
assigned to BINS accordingly. GT 44, 48 and 49 mapped to both A187G12 and D102A10, as 
well as 70D11. All of these clones were contained within a .lambda, phage 
( .lambda. GT4 8) isolated using GT 48 as a probe. . lambda. GT48 contains a Hindlll 
polymorphism (*) detected by both GT 44 and GT 48. FIG. 4b. Yac mapping of GT 44, 
illustrating the Hindlll polymorphism. FIG. 4c. GT 24 hybridizes only to 70D11 and 
D102A10 indicating its position within BIN 3C. 
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Drawing Description Text (17) : 

Northern blot analysis of GT clones. Examples of mRNAs detected with GT clones 
originating from the candidate HD region are shown. Total RNA from each cell line or 
tissue was prepared by standard procedures. The lanes represent RNA from the following 
sources: 1) Caco-2 intestinal cells, 1A) Caco-2 poly A.sup.+ RNA, 2) HL60 cells, 2A) 
HL60 Poly A.sup.+, 3) lymphoblasts , 4) fibroblasts, 5) liver, 6) Cos cells, 7) frontal 
cortex, 8) feral brain, 10) Caco-2 intestinal cells. RNA was separated on 1% agarose 
gels containing 0 . 6M formaldehyde and transferred onto DX (Amersham) membranes. The 
integrity of the RNA is shown by the ethidium bromide stained gel in the left upper 
panel. Clones were radiolabeled by random priming and hybridization and washing 
conditions were carried out as previously described. The size of the message detected 
with each clone is indicated in kilobases. 

Drawing Description Text (19) : 

Genomic rearrangement in two families with HD. Southern blot analysis of Msp I 
digested genomic DNA probed with GT 48 revealed an altered band in 2 of 250. FIGS. 7a 
and 7b show co- segregation of the altered 1 . 7 kb Msp I fragment with all affected 
individuals in both families. FIG. 7c Southern blot analysis of genomic DNA from one 
affected individual from each family (lanes 1 and 2) and a control (lane C) . Genomic 
DNA digested with a variety of enzymes and probed with GT 4 8 resulted in altered bands 
identical in the affected individuals from the two families. 

Drawing Description Text (21) : 

Alu retrotransposition within the HD candidate region. Mapping of the genomic region 
around GT 48 in controls and the affected individuals, localized the rearrangement to 
the 1.2 kb Hindlll fragment on . lambda. GT48 (boxed). The 1.2 kb Hind III fragment (SEQ 
ID NO: 5) was subcloned, sequenced and PCR primers spanning the insertion site were 
derived. These primers (A : ATGTAATTGTTCACGACATGTGGC (SEQ ID NO:13), 
B : AAATAACATCCAGAATCTTCAGAT) (SEQ ID NO:14) generated a 118 base pair fragment in 
normal individuals (FIG. 8b lanes 6-9) and 460 base pair product in five affected 
individuals from both families (FIG. 8b lanes 1-5) . The 460 base pair PCR product was 
subcloned (TA cloning, Invitrogen) and sequenced by ABI automated sequencing. The 
inserted sequence represents a full length Alu element (bold) and the insertion site 
is flanked by a 9 base pair direct repeat (underlined) . 

Drawing Description Text (23) : 

Physical Map between D4S95 and D4S182: Long range physical mapping localized GT 24, GT 
48 and the . alpha . -adduc in cDNA clone to the same 60 kb Not I fragment. Cosmids J7 and 
B7 were isolated from a chromosome 4 specific library with D4S182 and . alpha . -adducin 
cDNA respectively. . lambda. GT48 and . lambda. GT24 were isolated from a .lambda, phage 
library using their respective GT clones. . lambda. gt4 8 and . lambda. SS2 form a contig 
overlapping with cosmid B7 . An oligonucleotide from the 5' UTR of adducin detected the 
D4S182 cosmid J7 as well as . lambda . GT24 . By physical mapping GT 4 8 is approximately 
20 kb from GT 24. 

Drawing Description Text (27) : 

Assignment of cDNA fragments to BINs by hybridization to overlapping YAC clones in the 
proposed region for the Huntington disease gene (A) . The physical intervals defined by 
overlapping regions of 10 YAC clones are indicated as BINs. Each cDNA fragment was 
hybridized to all or a subset of the overlapping YACs such that they could be assigned 
to the defined regions. Two cDNA fragments (B) GT 70 and (C) GT 48 are shown 
hybridized to overlapping YACs digested with Hindlll. 

Drawing Description Text (29) : 

RNA hybridization analysis of 5 retrieved cDNA fragments from the candidate region. An 
ethidium bromide stained gel is shown in the upper left panel. The clone names, their 
physical interval (BIN) assignment and the size of the mRNAs that were detected (in 
kilobases) are indicated. Hybridization to RNA from Caco-2 (intestinal), HL60, 
Lymphoblast, Fibroblast cell lines or from frontal cortex RNAs are shown. Part of the 
analysis with GT 70 has been shown previously, but sizes of the bands have been 
reassessed . 

Drawing Description Text (31) : 

Sequence analysis of GT 70 and GT 149. Two of the retrieved clones detected a pair of 
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large transcripts by hybridization to RNA. These clones did not overlap and were 
mapped to adjacent physical intervals defined by the overlapping YACs . They contained 
multiple exons, demonstrated strong cross species conservation and upon sequencing 
analysis displayed significant coding potential (underlined) . In the listings, the 
letter "n" designates an unidentified nucleotide. 

Drawing Description Text (33) : 

An illustration of identified cDNAs and their nucleotide positions corresponding to 
the HD sequence, GT 63, 70 and 149 are the fragments of the gene initially identified 
by gene tracking. 

Drawing Description Text (41) : 

Al. Hybridization of GT 70 to poly A.sup.+ RNA from fetal brain and CaCO-2 
(intestinal) cell line and to total RNA from CaCO-2 and Hep G2 cell lines reveals 2 
transcripts with the larger transcript most predominant in brain and the smaller more 
abundant in the cell lines. 

Detailed Description Text (4) : 

In spite of the limitation of using only four tissue sources, the combined length of 
the transcripts detected with the GT clones contained within the 70D11 YAC comprising 
450 kilobases of genomic DNA adds to greater than 30 kilobases, indicating that a 
minimum of 7% of genomic DNA in this region is transcribed. This corresponds to the 
overall expected proportion of transcribed sequence, but in all likelihood does not 
correspond to all the genes in this region. 

Detailed Description Text (5) : 

A number of cDNA clones were obtained that did not detect mRNAs by analysing total RNA 
of the source tissues. However, sequence analysis and their hybridization patterns 
strongly suggested that these clones were portions of genes. For example, GT 133, in 
BIN 4, detects multiple exons on genomic DNA but did not detect a message in total RNA 
from the tissues tested. 

Detailed Description Text (8) : 

Our strategy has been to initially use these GT clones to screen cDNA libraries and to 
screen DNA and RNA from many HD patients in an effort to further refine the assessment 
of candidate genes. In this light, GT 24 which detects a large transcript clone close 
to an Alu retrotransposition event deserved further investigation. GT clones showing 
multiple bands on southern blot hybridization with excellent coding potential also 
warranted further consideration. For example, the transcription unit detected by GT 70 
which has excellent coding potential, detects several genomic fragments, sees two 
distinct RNA species and also detects DNA changes or rearrangements in patients with 
HD. 

Detailed Description Text (9) : 

GT 149 also detects transcriptional units and which also has excellent coding 
potention. The two transcriptional units are the same as those detected by GT 70. The 
two distinct mRNA species have respectively molecular weights of 10.3 kb and 13.7 kb. 
Such identified forms of the mRNA are due to variations in the 34 0 untranslated region 
of the HD gene. It is believed that the larger transcript which is present in the 
human brain in significantly increased mounts and as derived from the HD gene 
including the UTR HD 14 is closely associated with Huntington's Disease. The 3' UTR of 
HD 14 provides a useful entity for detecting, analyzing and the prognosis of 
Huntington's Disease in humans due to the selective increased expression of this 
entity in the human brain. 

Detailed Description Text (10) : 

A transcriptional map as described in more detail in the Examples and used to develop 
the strategy in locating GT 70, GT 149 and HD 14, is equally applicable to any other 
genomic region and will greatly assist in the search for any disease gene. 
Furthermore, by cloning the disease gene, the development of a detailed transcription 
map of a particular region allows further assessment of the possible regulatory 
inter-relationships between genes in that region. In addition, antisense RNA or DNA 
can be provided to bind specifically with the HD gene mRNA, thereby interrupting the 
precise molecular choreography which express the gene as a protein. The antisense 
material provides a very useful form of gene therapy to possibly arrest HD progression 
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in the brain and other tissue (J. J. Toulme et al . Gene Vol. 72, No. 1, pg. 51-58, 
December 1988) . 

Detailed Description Text (58) : 

Using a transcription map derived from the defined region we also obtained candidate 
genes for HD. To construct the map, three overlapping YACS were used which spanned the 
entire region of interest extending approximately 0 . 5 Mb proximal and distal from the 
D4S95 locus, the marker which most consistently shows non-random allelic association 
with HD. A total of 50 cDNA clones were isolated using direct cDNA selection. A total 
of 250 HD patients were screened with a series of cDNA clones (GT) , one of which (GT 
48) revealed an insertion of an Alu repetitive element in two families with identical 
DNA marker haplotypes on their HD chromosomes. In addition to complete segregation 
with HD in these two families, the insertion is not seen in 1000 control chromosomes 
in the general population. This includes 14/687 persons with an identical core 
haplotype suggesting a causal relationship between this rearrangement and HD. The 
insertion site is immediately adjacent to two overlapping transcriptional units 
including . alpha . -adducin and another which encodes for a 12 kb transcript. 

Detailed Description Text (66) : 

In addition to refined physical mapping, the clones were also categorized into 
transcription units by cross -hybridization to each other and to RNA from a variety of 
tissues and cell lines. The results for seven GT clones are shown in FIG. 6 of the 
clones that were isolated from the 7 0D11 YAC, one group was found to correspond to the 
.alpha. -adducin message previously identified . sup . 12 . 

Detailed Description Text (70) : 

We have screened for rearrangements with those GT clones that map to BIN 3. One GT 
clone, GT 48 detects an insertion of approximately 330 bp in 2 of 250 HD patients. 
This rearrangement segregated with HD in both families (FIG. 7a, 7b) and was seen in 
genomic DNA digested with multiple enzymes (FIG. 7c) . Interestingly, in one of these 
families (FIG. 7A) recombination had placed the HD gene distal to D4S125 (FIG. 5) . 

Detailed Description Text (73) : 

Detailed restriction mapping localized this rearrangement to a 1.2 kb Hindlll fragment 
which contained a portion of GT 4 8 (FIG. 8a) . Sequence analysis of the rearrangement 
in both families demonstrated an insertion element of 331 base pairs which is a member 
of the Alu family of mobile repetitive elements. With primers flanking the insertion 
site, the inserted element could be detected using PCR (FIG. 8b) . 

Detailed Description Text (76) : 

As previously described, several GT clones allowed the identification of cDNA for the 
.alpha. -adducin gene. sup. 12. The 3' UTR of . alpha . -adducin maps 20 kb telomeric to 
D4S95.sup.12 (FIG. 9). An oligonucleotide primer which spans nucleotide 38-58 in the 
5' untranslated region of the . alpha . -adducin gene maps telomeric to the Alu insertion 
and is located on the same 7.4 kb EcoRI fragment as GT 24 but does not hybridize to GT 
24 (FIG. 9) . In addition, a 501 bp RT-PCR product corresponding to nucleotides 38-539 
of the .alpha. -adducin cDNA also detected the 7 . 4 kb EcoRI fragment. This places the 
5' UTR of the . alpha . -adducin gene in close proximity to D4S182, flanking GT 24 and 
indicates that the . alpha . -adducin gene spans at least 80 kb between D4S95 and D4S182 
(FIG. 9) . 

Detailed Description Text (78) : 

Corresponding transcript (s) for GT 48 and the two other adjacent clones, GT 44 and GT 
49, were not detected. Northern blot analysis and screening of 10 different cDNA 
libraries with these cDNA clones did not yield any positive results. Sequence analysis 
of the 1.2 kb Hindlll fragment containing GT 4 8 did not reveal a significant coding 
potential . 

Detailed Description Text (79) : 

Nevertheless, the presence of a new Alu element might interfere with expression of 
other genes near the site of insertion. We therefore focused our attention on two 
other cDNA clones. GT 24 and GT 34. Northern blot analysis showed that GT 34 detected 
a 4 kb transcript in a variety of tissues including brain, lymphoblasts and 
fibroblasts. A 4 kb cDNA clone (cD510) was then isolated with GT 34 as probe. Sequence 
analysis of this cDNA clone revealed no homology with sequences in Genbank. Further 
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mapping data showed that the genomic DNA sequence corresponding to cD510 mapped distal 
to D4S95, but centromeric to the 3' UTR of . alpha . -adducin and at least 70 kb from the 
site of the Alu insertion (FIG. 4) . Based on the map location, therefore, cD510 became 
an unlikely candidate for the HD gene. 

Detailed Description Text (80) : 

The third clone, GT 24, was mapped approximately 20 kb from GT 48 (FIG. 9). Although 
GT 24 is also contained in an intron of the . alpha . -adducin gene it detected a 
different transcript of 12 kb (FIG. 4, FIG. 6) in many tissues including frontal 
cortex, fibroblasts, lymphoblasts , and intestinal cells (CaC02) . Besides some weak 
identity with the LINE-1 element, this clone also has no homology with any sequence in 
the data bases. However, at 69 bp open reading frame flanked by appropriate splice 
junctions was noted . sup . 23 . Furthermore, based on its map position close to the Alu 
insertion site, the 12 kb transcript is a candidate gene for HD. 

Detailed Description Text (96) : 

Manual or automated (ABI 373A) sequence data were obtained and entered into a Sun 
Microsystems Sparc IPX workstation and compared with previously entered sequence data 
(of GT clones) using the XDAP module of the Staden package. Sequence data were then 
sent to the e-mail server at the National Center for Biotechnology Information (NCBI) 
and compared with the non-redundant GenBank, dbEST, Macvector and Transcription Factor 
databases using the BLAST suite of programs. The CRM module of the Gene Recognition 
Analysis Internet Link (GRAIL) e-mail server was used to assess protein-coding 
potential and a search for open reading frames bracketed by splice junctions was 
conducted with the SORFIND program. The PYTHIA e-mail server was used to identify and 
classify known human repeat elements. 

Detailed Description Text (102) : 

A series of additional overlapping YACs were also used to define physical intervals or 
BINs across the 1 megabase region as depicted in FIG. 11A. Refined positioning of each 
cDNA was deduced by the hybridization pattern to this array of YACs. For example, the 
hybridization pattern of clone GT70 (FIG. IIB) is consistent with it originating from 
the overlapping portions of the 353G6 and 70D11 YACs, in BIN 2. As well, this clone 
detected multiple bands indicating that it contains more than a single exon and also 
displayed striking cross species hybridization. The GT 48 clone (FIG. 11C) detected 
two Hindlll restriction fragments in three of the YACs suggesting it originates from 
BIN 3B. It detected only a single EcoRI genomic fragment and did not show cross 
species hybridization. An additional 56 clones were mapped in a similar manner and the 
results are listed in Table 3 . 

Detailed Description Text (103) : 

Refined map position was obtained for two cDNA fragments which were located at the 
ends of the 70D11 YAC . The hybridization pattern seen with GT 70 on the different YACs 
(FIG. 11B) and chromosome 4 hybrid and human DNAs, (data not shown) indicate that this 
clone in all likelihood maps to the end of the human DNA segment in the 70D11 YAC and 
is entirely contained within the other YACs to which it hybridized including 33306. 
Through a similar analysis the clone GT 133 from BIN 4 was found to originate from the 
other end of the 70D11 YAC. 

Detailed Description Text (106) : 

The combined information of RNA hybridization and physical mapping clearly indicate 
that some of the GT clones were portions of the same transcription units. GT 70 and GT 
149 (FIG. 12), for example, both detect the same distinct pair of very large 
transcripts (10 and 12 kilobases) . Furthermore, GT 70 and GT 149 map close to each 
other (FIG. 11 and Table 3) , but they do not cross -hybridize nor overlap by sequence 
analysis. Both GT 70 and GT 149 have excellent coding potential as judged by the GRAIL 
e-mail server (FIG. 13 and Table 3) . Furthermore, GT 63 hybridized to EcoRI fragments 
that were identical in size to those detected by GT 70 and was found by 
cross-hybridization to overlap with it (Table 3) . 

Detailed Description Text (108) : 

Overlapping clones were found by cross -hybridization of individual clones to all 
others or by sequence analysis. For example, GT 98 which detects a 3 . 6 Kb transcript 
hybridized to two other clones in BIN 5 (Table 3). One of these, GT 123, is also 
located in BIN 5 but only weakly cross -hybridized to GT 98 and does detect a 
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transcript of identical size. That these clones overlap was also supported by 
examining the EcoRI genomic restriction fragments to which these clones hybridized 
(Table 3) . 

Detailed Description Text (109) : 

Sequencing indicated the majority of clones selected were independently derived. Some 
of the overlapping clones (Table 3) detected abundant mRNAs . An exception was noted 
for GT 23 of BIN 3 which was derived from frontal cortex cDNA, did not detect mRNA and 
yet showed overlap with five other clones of 100 examined. It also hybridized to 
clones originating from fetal brain cDNA from a different selection experiment. Cross 
hybridization did not occur from repetitive sequence, as all of these clones 
hybridized to a single EcoRI band in genomic DNAs . This does suggest a preferential 
selection of this sequence through the process of hybridization to the immobilized 
genomic DNA or during amplification of the retrieved material. This preference was not 
evident with the tissue mix cDNA selection as GT 23 detected only two clones (both of 
which were not characterized further) of 100 tested, indicating that selection with a 
wider diversity of starting cDNAs may minimize the preferential retrieval of some 
sequences . 

Detailed Description Text (110) : 

In addition to the patterns of DNA and RNA hybridization, sequence analysis was 
performed to determine cDNA overlap, their coding potential and to search databases of 
sequenced genes for identity or similarity. Many clones appeared to have been derived 
from unprocessed RNA since they lacked consistent open reading frames. Potential or 
partial exons were detected in them using the SORFIND program. Out of 31 
non-overlapping clones, 5 showed identity with . alpha . -adducin and one, GT 161, was 
identical with the expressed sequence tag HUMXT01095. 

Detailed Description Text (111) : 

One cDNA fragment appeared to detect additional sequences. For example, GT 161 which 
showed identity to an expressed sequence tag, hybridized strongly to the 2A11 YAC DNA 
digested with EcoRI and to a band of corresponding size in total human DNA (Table 3) . 
A less prominent hybridizing band was also observed in human DNA that corresponded in 
intensity and size to one seen in a human-hamster hybrid containing chromosome 1 as 
its only human material, suggesting this clone represents a portion of a gene which 
may belong to a gene family (Table 3) . 

Detailed Description Text (127) : 

A combination of general purpose text processing software and local sequence analysis 
software was used to extract subsets of the large public data bases based on feature 
table entries for poly A sites and for 3' UTR regions. These subset data bases were 
searched using a complete dynamic programming algorithm. 

Detailed Description Text (132) : 

As part of our strategy to detect the transcriptional units originating from the 
region spanning 500 kb on either side of the D4S95 locus, we previously isolated 58 
cDNA segments. Three of the CDNA clones (GT 149, GT 63 and GT 70) (FIG. 14) were found 
to correspond to the sequence of the HD gene. Using two of these nonoverlapping cDNAs 
(GT 70 and GT 149) , we screened a human frontal cortex cDNA library and identified two 
larger cDNA~~clones (cD 70-2 and HD 149-101) (FIG. 14) . HD 149-101 and cD 70-2 were 
used to screen a number of other human cDNA libraries including those of retina, 
frontal cortex, fetal brain, caudate, and muscle tissues. In addition, a 1 kb PCR 
product corresponding to nucleotide 8000-9000 of the published sequence, was also used 
to screen the frontal cortex library. Additional cDNAs were identified including HD 12 
and HD 14 . 

Detailed Description Text (138) : 

GT 70, GT 149 and cD 70-2 detected two mRNA transcripts in all tissues assessed 
Including total and/or poly A.sup.+ RNA from lymphoblast, frontal cortex (FIG. 17), 
intestine, liver and lung (data not shown) . Similarly two transcripts were seen in 
total and poly A.sup.+ RNA from a number of cell lines including lymphoblast, CaCO-2, 
Hep G2 (FIG. 17) , HL 60 and 293S cells (data not shown) (FIG. 17) . Using conditions 
that discriminated between human and rodent transcripts, these mRNAs were also both 
observed (data not shown) in the hybrid cell line GM 10115 containing chromosome 4 as 
its only human component indicating that both transcripts originate from chromosome 4. 



6 of 8 



2/3/03 6:55 PM 



Record Display Form 



http://westbrs:8002ftin/gate.ex^ 



Furthermore all hybridizing genomic bands detected by these cDNA fragments could be 
accounted for between total human, chromosome 4 and YAC DNA (data not shown) . This 
information provides further evidence that the two messages in all likelihood 
correspond to a single HD gene. 

Detailed Description Text (139) : 

The larger mRNA is the predominant transcript in adult and fetal human brain compared 
to lymphoblasts and cell lines including Hep G2 and CaCO-2 where the smaller sized 
transcript is more abundant (FIG. 17) . This was confirmed by densitometry analysis 
which showed a decreased intensity of approximately 3 fold in the ratio of the smaller 
to the larger transcript in adult and fetal brain. In contrast, in lymphoblast and 
cell lines as noted and in human intestines, liver and lung, the smaller to larger 
transcript ratio was increased in intensity by at least 2 fold. The non overlapping 
2.4 kb Hindlll and 1.4 Pstl/EcoRI fragments of HD 14 were used in Northern Blot 
analysis and in contrast to the two transcripts detected with GT 70, GT 149 and cD 
70-2, only the single larger 13.7 kb mRNA was detected (FIG. 17). 

Detailed Description Text (140) : 

The earlier finding that the GT 70 and GT 149 corresponding to the HD gene detected 
two different sized mRNA species (Experiment 3) prompted an investigation of the 
relationship between these two mRNA species. We uncovered partially overlapping but 
distinct cDNA clones which span 4164 bp (HD 12) and 5,710 bp (HD 14) respectively. The 
region of overlap between these two cDNAs and the HD sequence shows an identical 
protein coding sequence, but in HD 14 an additional 3,3 60 bp of non-coding sequence is 
identified . 

Detailed Description Text (141) : 

This experiment demonstrates that the identified cDNAs (HD 12 and HD 14) originate 
from a single gene by DNA-hybridization analysis, restriction mapping and sequencing. 
Several mechanisms can lead to generation of different mRNAs from the same gene. 
Differential splicing events, alternate use of transcription start sites, or the 
selection of different polyadenylation sites can lead to multiple mRNA species 
generated from the same genomic region. Our experiments show that differential 
polyadenylation results in a larger transcript detected by RNA hybridization. It is 
generally appreciated that the majority of eukaryotic mRNAs possess a poly A tract at 
their 3' terminus. The addition of poly A occurs post-transcriptionally in the nucleus 
and involves cleavage of the primary transcript and subsequent addition of poly A to 
the newly formed 3' end. The cis-acting sequence usually AATAAA, located 15-25 
nucleotides upstream of the poly A addition site, is highly conserved and critical for 
polyadenylation. Alterations within these cis-acting sequences can lead to the 
reduction or even abolition of 3' processing. Both the hexanucleotides seen in the HD 
12 and HD 14 cDNAs have substitutions within this consensus that would be predicted to 
reduce the cleavage of the primary transcript and subsequent addition of poly A to the 
newly formed 3' end. The AGTAAA hexanucleotide which is seen 5' of the poly A tail on 
the HD 12 cDNA would be predicted to have significantly less (.about. 30%) efficacy in 
affecting cleavage and subsequent addition of poly A compared to mRNA with the 
complete sequence AATAAA and yet for most issues excluding brain this appears to be 
the predominantly used signal . The hexanucleotide ATTAAA which is seen 5" to the poly 
A of the larger cDNA (HD 14) is predicted to more efficient relative to AGTAAA but 
also would be predicted to have less (.about. 70% ) efficacy for processing and 
addition of poly A to the newly formed 3 ! end than the consensus sequence . 

Detailed Description Text (147) : 

It is also, of course, possible to express genes encoding polypeptides in eukaryotic 
host cell cultures derived from multicellular organisms. Mammalian cell lines 
available as hosts for expression are known in the art and include many immortalized 
cell lines available from the American Type Culture Collection (ATCC) , including HeLa 
cells, Chinese hamster ovary (CHO) cells, baby hamster kidney (BHK) cells, and a 
number of other cell lines. Suitable promoters for mammalian cells are also known in 
the art and include viral (1978), Rous sarcoma virus (RSV) , adenovirus (ADV) , and 
bovine papilloma virus (BPV) . Mammalian cells may also require terminator sequences 
and poly A addition sequences; enhancer sequences which increase expression may also 
be included, and sequences which cause amplification of the gene may also be 
desirable. These sequences are known in the art. Vectors suitable for replication in 
mammalian cells may include viral replicons, or sequences which insure integration of 
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the appropriate sequences encoding NANBV epitope into the host genome. 
Detailed Description Paragraph Table (3) : 

TABLE 3 . 

Clone EcoRI Frag RNA Hybridization GT Size (bp) Sizes Size and Distribution Sequence 

Analysis 

BIN 1A BIN IB ##STR1## -650 -600 912 7.0 12.0 12.0 5.5 kb : W, Fl , C, B, Co absent 
absent ##STR2## 65 207 2.9 absent DB search neg. 69 976 3.8 absent DB search neg. 
MER3c repeat, 12 bpGT repeat ##STR3## 573 -500 -600 9.5 9.5 9.0 absent absent ##STR4## 
166 -550 12.0 4.5 kb : similar to GT88 Not sequenced 88 -600 6.0 4.5 kb : similar to GT 
166 Not sequenced 149 584 6.0, 5.0 10 kb, 12 kb: K, Co, Fi, L, W, C DB search neg. 
Coding Potential Excellent, Predicted exon. BIN 2 66 165 644 600; 550 10.0 11.5, 4.2 
absent absent ##STR5## 87 536 8.5 absent DB search neg. 70 63 757 600 9.0, 8.5, 1.2 
9.0, 1.2 10.0, 12.0 kb, : L, F, C, W, B ND ##STR6## 54 757 2.7 absent DB search neg. 
ALU and MER18 repeats 72 764 2.8 absent DB search neg. 189 695; 578 11.0, 6.0 DB 
search neg. 2 partial ALU repeats, composite clone BIN 3A BIN 3B ##STR7## 551 592 532 
595 589 597 14.0 14.0 14.0 14.0 absent absent ##STR8## 136 -500 14.0, 7.5 absent Not 
sequenced 44 646 13.7 absent DB search neg. 48 550 14.0 absent DB search neg. ##STR9## 
516 -500 560 9.0 5.0 3.8 kb : W, L, F, C, Co ND ##STR10## 167 -500 6.4 absent Not 
sequenced. ##STR11## 490 -600 -500 -560 -450 13.0 15.0 7.0 14.0, 2.8, 0.5 10.0, 5.2 
4.0 kb: Adducin 4.0 kb : Adducin 4.0 kb: Adducin 4.0 kb : Adducin 4.0 kb : Adducin 24 
-600 15.0, 7.8, 6.0 12.0 kb DB search neg. 307 bp similar to LI repeat. 30 458 6.0 DB 
search neg. ALU repeat 138 -600 13.0 absent DB similarity, 4.2e-4, HSILIAG, Alu repeat 
present ##STR12## -550 550 8.0, 14.0 16.0, 14.0, 7.5 14.0 ND absent absent ##STR13## 
53 -550 16.0 absent DB search neg. 182 bp of LI repeat BIN4 128 480 14.0 absent DB 
search neg ##STR14## 422 443:250 439 400 12.0.11.0 11.0.9.0.4.2 11.0 5.5 kb 5.5 kb 5.5 
kb ##STR15## 43 495 6.0 absent DB search neg. 495 bp ORF . coding potential 133 480 
14.0.18.0 absent DB search neg. BIN 3A BIN 5b ##STR16## 450 447 352 14.0.9.0 
14.0.9.0.4.1 14.0.9.0 3.6 kb : wide distribution 1.8,3.6 kb : FI, L, W, C, C o ND 
##STR17## 125 662 7.5 absent DB search neg. 137 500 9.0 absent DB search neg. 160 -500 
4.2 absent DB search neg (partial sequence). 179 bp ORF. Coding potential good. 161 
349 9.0 3.8 kb: DB match, 3 . 3e- 102 , HUMXT01095 (EST) is identical 

Table 3 

Legend Summary of characterization of 58 retrieved cDNA fragments. The clones ar 
listed by name (as GTnos.) according to their physical intervals or BINs assignment 
YAC clones. The sizes of the cDNA fragments are given in base pairs. The genomic 
fragments detected with these clones in human and yeas DNAs digested with EcoRI are 
also listed. Sizes of mRNAs detected in the tissues are given in kilobases from 
K--Kidney, Co--Cos cells, Fi - -fibroblasts , L- - lymphoblast , W--HL60 cells, C--Cacp2 
cells, B--bone marrow, F--frontal cortex, FB--fetal brain. Groups of clones that are 
shown bracketed indicate those that partially overlap as determined by cross 
hybrodization or sequence analysis. Database (DB) searches were carried out against 
nonredundant nucleic acid and protein databases of NCBI, as well as the dbEST and 
Transcription Factor databases. Characterized repeat sequences were edited prior to 
BLAST searches. All database marches with BLAST expectation values less than 1 .times. 
10. sup. -4 are reported, but similarities greater than 10. sup. -10 were considered 
borderline. Coding potential was judged by the GRAIL email server and potential exons 
were identified using the SORFIND program. Human repeat sequences were identified by 
the PYTHIA email server. 
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